Dee 








THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 








Volume XXX January, 1939 Number 1 








THURSTONE’S WORK RE-WORKED 
C. SPEARMAN 


I. NEO-FACULTIES 


1. Historical.—Since the dawn of psychology some three thousand 
years ago, and right up to the present day, the theory of mental 
ability has been dominated by the concept of ‘‘faculties,’’ “‘ powers” 
or ‘‘capacities’’; for instance, those of ‘‘intelligence,’’ ‘‘memory,”’ 
“Imagination,” and so forth. But the preceding quarter of a century 
has seen this ancient theory of faculties at least challenged by that 
of ‘‘Two Factors,” these being designated respectively as “G”’ and 
“S.” And during the last two or three years, yet another advance 
has been attempted; the theory of ‘‘two” factors has been developed 
into that of ‘‘multiple”’ factors. 

2. A New Research.—Up to the last few weeks or so, however, this 
further effort had been almost confined to the invention of a novel 
method. As for the data to which this was applied, these had been 
borrowed from other researches, and employed only for purposes of 
illustration. Also the actual results attained by this newly invented 
method had not differed notably from those reached by its predecessors. 

But quite recently there have been rumours—and even some 
fragmentary advance publications—of a research devoted specially 
to supplying the method with improved data; such as would really do 
it justice. 

And now at last this important project has been notably realized. 
A work has actually been produced by Thurstone which, in several 
respects at any rate, stands almost unrivalled.' The tests employed 
were as many as fifty-seven, and appear to have been selected and 
organized admirably. They have, too, been used with no less than 
two hundred forty university students; subjects which greatly 





1 Thurstone, L. L.: Primary Mental Abilities. Chicago: University of Chicago 
Press, 1939, Vol. rx, 121 pp. 


1 

















2 The Journal of Educational Psychology 


simplify the scientific situation, by virtue of having passed beyond the 
age usually taken to involve the complication of intellectual growth. 

3. Disappearance of G.—What, then, has been the chief outcome 
of all this auspicious undertaking? Surely, a matter for surprise. 
As foremost result there is now—perhaps for the first time—a complete 
disappearance of any general factor. Truly enough, such a factor had 
already been denied by several previous authors; but only to be in 
some guise or other forthwith reinstated. Thus, Thomson had made 
a strong protest that the variates representing abilities do not display 
‘a ghost of a general factor.”! But eventually, his repudiation had 
softened to only rejecting any general factor of the nature of ‘‘a really 
existing organ.”’ Again, Kelley had often been quoted as opposing 
the general factor. But in truth, he had on the contrary emphatically 
claimed to find one. He had only objected to a particular fashion of 
interpreting it.2 Yet again, Guilford had hesitated to admit that 
his analysis disclosed any genuine “‘G.” But if not, it had at least 
produced something remarkably like this.* Yet again, Thurstone 
himself had recently analyzed Brigham’s abilities without any manifest 
recourse to a general factor. But really this was all the time being 
introduced by him under the mask of ‘‘oblique reference axes.’’* 
Our surprise at the absence of the general factor in Thurstone’s 
present work is further enhanced by the fact of it having been most 
emphatically demonstrated by Alexander working with Thurstone’s 
own methods and even under his own guidance.’ Indeed, as we shall 
see, at one stage of the operations in his present work itself, Thurstone 
arrives at a general factor in its extreme form; but later on it suddenly 
vanishes. 

4. “Faculties”? Again.—Turning from this negative aspect of his 
work to its chief positive contribution, we find it producing factors 
in very large numbers, no less than thirteen of them. Seven are 
described as dealing, respectively, with the following material: ‘‘space’”’ 
(S), ‘verbal relations” (V), ‘‘isolated works” (W), ‘‘memory” (M), 
“induction” (I), ‘‘perceptual speed” (P), and ‘“‘arithmetic” (N). 
Then comes R which is described as the successful completion of a 





1 Thomson: Brit. J. Psych., Vol. vim, 1916. 

2 Kelley: Crossroads in the Mind of Man, 1928, p. 23. 

3 Guilford: Psychometric Methods, 1936, p. 505. 

4 Thurstone: The Vectors of the Mind, 1935, p. 167. 

6 Alexander: Intelligence Concrete and Abstract. Brit. J. Psych. Mon. Suppl., 
1935, p. 120. 


— hed « —" > _ 


oe co Mm es 6. 7s 65 we bee 


re = 








e Vs Gr —_= 


TD 


Thurstone’s Work Re-worked 3 


task that involves some form of restriction in the solution. Then we 
get D which is said to be of a “‘deductive nature.” Next come three 
more factors whose characterisation is not attempted. The thirteenth 
and final factor seems to be a sort of general residuum. 

Omitting this last factor, all the rest obtained in this fashion 
are found to have weights, loadings, or variances of astonishingly equal 
average magnitudes. Herein they stand at the opposite pole to those 
got by all previous multiple analysis, including the said centroid 
method; for with all those the weights had been found to be as much 
as possible unequal. 

To crown everything, seven of these factors are emphatically 
designated by this author as “‘ primary abilities.” Hereby they appear 
to indicate a return to the ancient doctrine of faculties. They incom- 
parably surpass these, no doubt, in respect of statistical basis. But 
they join hands with them in reducing ability to a definite number of 
more or less watertight compartments. 


Il. NEW METHOD 


1. Centroid and Rotative Procedures.—From these disturbing results 
of the new method let us turn to consider this method itself. The 
chief difficulty here, as in most factorization, is the defect of this not 
being unique. ‘To fit one and the same set of correlations, there can 
in general be found an infinite number of different factorial systems. 
The method must, then, be such as somehow to make a reasonable 
choice between them. 

Now Thurstone for his part begins by employing the method 
invented by himself and commonly designated as ‘‘centroid.”’ It has 
received this name because by it the axes of references are so chosen 
as to pass through the central positions of the respective factors 
(or ‘“‘vectors,” as he prefers to call them). On this basis the factors 
are duly calculated by him and, naturally enough, turn out very like 
all those which he himself and others had obtained previously by this 
same centroid method; indeed, they are not unlike those obtained by 
all other previous methods of multiple factor analysis, such as those 
of Hotelling and Kelley. 

But now he introduces a further step. This whole centroid deter- 
mination of the axes is declared by him to be “‘arbitrary.”’ They need, 
he says, to be rotated to new positions that are “‘meaningful.”’ 

2. Eliminating Negative Weights.—And here at length we come 
upon his really decisive move. In his rule or rules for governing this 














4 The Journal of Educational Psychology 


final rotation it is that we must look for the secret of his revolutionary 
results. 

At this point, however, we must confess to some disappointment. 
He does indeed inform us of one principle adopted by him. He writes 
as follows: 


By psychological considerations the hypothesis can be entertained that the 
primary factors act positively unless they are absent from a performance. 
If this assumption is correct, then the projections of the test vectors on the 
coérdinate axes should be either positive or zero. In rotating the axes, an 
attempt has been made so to place them that the test vectors have no signifi- 


cant negative projections, while the axes have been retained in their mutually 
orthogonal relation.! 


This indeed, so far as it goes, would appear to be quite justified. 
But clearly it does not go far enough. In particular, it by no means 
explains either the vanishment of G, or the advent of the remarkable 
neo-faculties. On the contrary, it had already been employed—in 
consultation with Thurstone himself—by Alexander, and in their 
hands it had led to the contrary results; it had neither eliminated the 
general factor nor introduced anything like primary faculties; far 
from upsetting the two-factor theory, it had supplied its most brilliant 
exposition. __ 

3. Maximizing the Number of Zero Weights.—Well realizing himself, 
evidently, these limitations to the principle of eliminating the nega- 
tive weights, Thurstone proceeds to add a second. It is that of: 


Maximizing the number of entries in the factorial matrix that are zero, 
or nearly zero. 


But as regards details about this additional principle, he departs 
from his customary admirable explicitness. His whole manner of so 
rotating the axes as to achieve his revolution of psychology is dis- 
missed by him with the curt remark that: 


many different methods have been used in these successive rotations, and it 
would be impossible to describe them all. 


III. SUBMERSION IN ERROR 


1. Algebraic Shuffling:—In this default of information as to how 
the said new principles actually lead to the novel factorization, we 
must needs try to answer this critical question for ourselves. And 





1 See work quoted in Note 1, p. 71. 








Thurstone’s Work Re-worked 5 


for the purpose we will quit the now popular geometrical representa- 
tion for the less elegant but sometimes psychologically more meaningful 
algebraic portrayal. 

Consider the following simple case: 


Li = Aiae + AipBs + Aiyyz 
Li = Aja, + AjpBs + AjyVz 


Here z; and 7; are the abilities of the individual z for the entire tests 7 
and j; az, Bz, and yz are his abilities for the mutually independent 
factors a, 8 and y; whilst the a’s are the respective weights, loadings, 
or coefficients of these factors in these tests. 


This situation can, of course, be expressed in the usual factorial 
matrix. We get: 


Factors 
a Bp Y¥ 
Gi Giz Giz 
Qj1 GAj2 Ajz 


i 
j 

Now, it is evidently possible to analyze any of the factors further, 
so as to make, for example; 


az= basdz + Dac€z 


where 6 and ¢ are new factors and the b’s are the new weights. More- 
over, these latter can be so chosen as to make 6 and e independent of 
one another as also of 8 and of y (not of a). For this purpose we have 
only to make the variance of 6.36 plus that of b..e equal to that of a. 
Next, by means of algebraic shuffling, we easily get: 


Xi= Aia(bisdz + bi€z) + aisBz + ivy: 
= (diabisdz + AisBz) + (Giadicer + Aiyyz) 
which may be written as: 








Tests 


Cie + Cine 
The factorial matrix may now be written: 


Factors 


f 9 
Cit Cin 
Cit Cin 


i 
j 

Thus, the weight of a has been transferred, partly to 8, and partly 
to y. In general the factorial weights admits of great latitude in the 


Tests 




















6 The Journal of Educational Psychology 


manner of redistribution. For the most recent and brilliant instance 
of this shuffling, we may refer to Holzinger on what he calls, ‘the 
general procedure . . . to solve for one group of factors in terms of 











PrRoFILE I.—Heaping. ProFi.e II.—Levelling. 


the other.”! But long ago some invaluable work of this kind had 
already been done by Maxwell Garnett.? 

2. Heaping, Levelling, and Scattering.—Here we will specially con- 
sider three typical cases, as illustrated by Profiles II to IV, where 
again the Greek letters represent factors, whilst the ordinates are 
their respective weights, loadings, or variances in some single test. 








ProrFi.e III.—Scattering. 


In II the great bulk of the weight is heaped high up on the single 
factor a, leaving but little for 8 and nothing for y or 6. In III, the 
weights on the contrary all lie on one and the same level. In IV the 
weights still lie on one and the same level, but now they are scattered 
among more numerous factors. 





1 Holzinger: Student Manual of Factor Analysis, 1937, Ch. VII. 
2 Garnett: Proc. R. Soc., 1919. 


> -~ 4 oe hee Oe ~~ ~_ st nh Aer eS a — 


cc 








Thurstone’s Work Re-worked 7 


By these types we are brought back to the experimental results 
of Thurstone.! The heaping in II bears obviously a general resem- 
blance to the centroid curve. A levelling analogous to that which is 
presented in III forms a conspicuous feature of the rotative curve. 
And finally the wide scattering distinctive of IV is manifested in both 
the one curve and the other. 

3. Phenomenon of Submersion.—But herewith our great question 
does not yet find any reply. Indeed, we seem farther then ever from 
understanding how the alleged principles could have led to the reported 

















” 

S~] 

4 

~ 

A 

< R 06 r 
. sa. ® ve 

lon _— 

J c fc ON 7 R op = a 

i a 3 ° $ ; 7 : ] Te rT) iz iT 
FACTORS 


ProFiLp 1V.—Centroid and rotative factors of Thurstone. 


results. Our profiles indicate rather the contrary. The number of 
zeros is maximized by neither of the levelled profiles III and IV, but 
instead by the heaped profile II. 

An approach towards solving the mystery may, however, be 
gained by recalling that the maximizing advocated was not limited 
to pure zeros, but extended to values “‘nearly”’ so. That is to say, 
the author, plausibly enough, treated as zero all such deviations from 
this as might possibly have been due merely to the experimental 
errors. 

Nevertheless, even this qualifying clause does not materially alter 
the situation so long as the errors involved are moderate in amount. 
Let us turn to V and VI which repeat our profiles II and III, but with 
the addition of shading within which all weights are reckoned as zero. 
We still, as before, find a maximum of zeros in the heaped distribu- 
tion, and none in those that are levelled, whether or not also scattered. 





1 But here we must always bear in mind that the types, for simplicity, deal 
with individual tests, whereas our extracts from Thurstone’s results refer to 
averages. 











8 The Journal of Educational Psychology 


But turn next to profile VII and VIII, where the distribution, 
besides being either heaped or else both levelled and scattered, also 
has a large error. Now the whole situation becomes reversed. Not 
the heaped but the levelled weights present the larger number of 
zeros; in fact, VIII is zero throughout; the weights are wholly sub- 
merged in error! 

There may be mentioned yet another condition, by which the 
weights or loadings will obviously tend towards being submerged. 


a 








a B § 
} Brvor 
y } Error 
PrRoFILeE V.—Heaping and small error. ProFrite VI.—Levelling and small error. 
Ot, 
p 
} Error 


§ 
PROFILE  M.. and large error. 
This occurs when the correlations, and therefore, of course, the load- 
ings, are of small magnitude. 

4. Application to Present Study.—So far, we have been — that 
such loadings as those of Thurstone can be produced from his principle 
of maximizing the number of zeros by the curious means that we have 
called submersion in error. Furthermore, we have found that this 
occurrence is chiefly produced by three conditions, which are: small 
correlations, large errors, and numerous factors. There remains for 


—» FF = -— FF FS 


= mp 








Thurstone’s Work Re-worked 9 


us the task of examining whether any such conditions actually pre- 
vailed in the present case. 

As regards the low size of the correlations, to take this first, the 
answer is clearly affirmative. The average correlation is no more then 
about .30. Nor can this fact be blamed. The whole work had what 
the French call ‘‘the defects of its qualities.’”’” It included, as we saw, 
no less than fifty-seven different tests. And for this comprehensiveness 
of the whole set it had to pay the price of a very brief time being 
alloted to each separately. 

Out of the same comprehensiveness there inevitably arose the 
second condition for submersion in error. The fact of including such 
multitudinous tests made the labour of calculating all the correlations 






GMMEM“TEREP HAMAS AR ASSIA y 


ELITIST ET 
ol B y § @ 


ProFiLe VIII.—Levelling, scattering and large error. Submersion. 


} Error 





almost prohibitive. The author was driven to the expedient of 
replacing the normal coefficients by the tetrachoric kind. But the 
effect of this was a very large increase of the errors of sampling. He 
himself tells us that the probable size of such errors for his two hundred 
forty subjects comes to about .07. Whereas with the normal product- 
moment coefficient, it would only have been .04. Hence the two 
hundred forty subjects treated by his method may be taken to have 
produced less accurate results than would have been got from only 
one hundred treated normally. 

Even more disastrous—and much less excusable—has been the 
fulfilment of third and last conditions of submersion in error; namely, 
the exceedingly large number of factors introduced; no less than 
twelve. What the maximizing principle would appear to have really 
effected is to replace one conspicuous large general factor by a dozen 
inconspicuous small ones. 














10 The Journal of Educational Psychology 


On the whole, then, the present revolutionary results would appear 
to derive, not from any cogent facts or logic, but only from a statistical 
accident. 


IV. COMPARISON WITH RESULTS OF OLD METHOD 


1. Method of Two Factors.—In order to illuminate this new method 
of analysis in more detail, however, let us for comparison submit 
the same experimental data to the older method employed in the 
theory of Two Factors. This earlier theory is based on the following 
theorem. When any four correlations satisfy the equation, 


Tey “Tew — T2e2° Tyo = 0 (A) 


then always, and then only, all these four correlations can be derived 
from a single factor. The theory undertakes to show that such a 
condition as is given in (A) always arises whenever the variables z, y, 
z, w, are sufficiently dissimilar. Yet more, it aims at indicating just 
the kind and degree of similarity (or other tie) at which no single 
factor remains adequate. 

One much used way of developing this theory further is by means 
of the following additional theorem: Denoting the said general factor 
by G, the supplementary proposition has been deduced that— 


Veyl zz 
-_ == (B)! 
Tyz 
and similarly: 
Tues 
ry = - 
zz 


Thence we can by known formulae derive the ‘‘residual”’ correlation; 
Tey-g = Tey — TzTyo (C)? 


Commonly, there are numerous pairs of variables, such as y and z, 
that all satisfy (A) and therefore are equally valid for determining r,,. 
Then, of course, every such pair—save for the experimental errors— 
should lead to the same value for it. On the other hand, with some 





1 For these further theorems, reference may be made to my own work, The 
Abilities of Man, 2nd ed., 1932, pp. 1-xx111 (or better, to Holzinger’s work quoted 
in foot-note 1, on p. 6). 

2 Formerly it was usual to employ, not the “residual” but the “partial” 
correlations. For the usage of these, see ibidem, pp. xxu-xx111. But the eventual 
result is either way much the same. 


_~ 


— 
| oh On ob ah ee oe oe be uc CE 








Thurstone’s Work Re-worked 11 


pairs the proposition (A) may be of doubtful validity; far from being 
assumed, it may be expressly put on trial. 

Accordingly, the general procedure is: First, to pick out all such 
pairs as are most certainly above suspicion; next, to make sure that 
these different pairs sufficiently corroborate one another; and, finally, 
to minimize their experimental errors by taking their result on an 
average. 

A very suitable use to be made of this procedure is in the case 
where—as here—the tests are already divided up into several classes. 
Obviously, those tests which belong to one and the same class are 
likely to possess common factors not shared by most members of the 
other classes. Accordingly, each such pair of ‘‘reference values”’ as y 
and z in B, should—provisionally, at any rate—be selected from differ- 
ent classes; different, both from one another and from that to which 
xz belongs. Then the respective results from all the pairs are averaged. 


TABLE I.—AVERAGE CORRELATIONS 
(Decimal Points Omitted) 




















Class names Tests} 1/2/3)}4,/5/)6/]7;]8)]91{10 
os oo oahu wiannew awn 4— 8/497|385|340/347|234|322|521/558/260/395 
i I ow «a eek ooo wei 9-16). . .|461|240/232/239|234/415'392/270|/372 
cs ctawesctacenvens ewe 17-25). . .|. . .|442/420/262|331/363/486/219)|241 
4. Form.......................-|26-29]...|...]. . .}880|266/296/303/427/201/220 
Ee 30-35)... .|...|...].. .|645|879|296/352) 177/243 
6. Numerical reasoning.......... 36-39). ..|...|...]...]...|402'390/437|199/271 
7. Verbal reasoning.............. 40-42)... .]...]...]...]...].. .|665/526/265/337 
8. Space reasoning.............. 43-—45/...]...]...]...]...]...].. -|693|2761330 
9. Rote learning................ oe OO ee ee ee ee 
10. Unclassified.................. 52-60)... oo Boo Oo Oo on oo .. S83 


























Another mode of compounding tests is also very serviceable, 
especially for a preliminary general orientation. Here, all the test- 
scores for each class are summed together, so as jointly to constitute 
a single test of wider range.? In this way, our original fifty-seven tests 
can be reduced to only ten. 





An alternative to any such classification is supplied by the ‘‘bi-factor”’ 
method of Holzinger; see his Manual quoted in foot-note 1, on p. 6. 

2 The formulae for effecting all such compositions involve very little labour; 
they are given in the present author’s “‘Correlations of Sums and Differences.” 
Brit. J. Psychol., 1913. 








|} 


12 


The Journal of Educational Psychology 





A less strict but still simpler artifice than the summing of the scores 
for each class consists in averaging the correlations of each. 
2. General Results—Proceeding in this last fashion, we get the 


correlations shown in Table I. 


And from these we are able, by means 


of formulas (B) and (C), to derive the residual correlations given in 








Table II. 
TABLE IJ.—ReEsiIpUAL CORRELATIONS 
(Decimal Points Omitted) 
1 2 3 4 5 6 7 8 9 10 
1 +056| +030} —035) —002| —078| —048|} +067; +037) —001) +059 
mS .¢caad +175| —061} —048; —012} —007| +051; —018) +059) +101 
Pere Perr +222} +124) —003) +016) —022} +054) —004) —045 
et ee eee Bene +076} +022} +005) —055| +026) —006| —046 
ee Pree eee Ieee +325; +118) —024) —008} —004) +005 
a a Lk ee eee, meee +092; +010) —011; —021; —010 
fp) ae, eee wee eee Pee eee +101; +005} +005) —008 
EE oc dit an ced inde deeeee duceel a6 abe «dee —005| —026| —059 
er ae Sore eee) ee Ene oe meee +158| +021 
| aS Pas ae a en ore cae Len” mere +076 



































Now, if in the latter table the correlations under .10 be regarded 
as insignificant, we find that this amount is only exceeded five times 
out of all the values lying on the diagonal line (type r.:.,), and only 
three times out of the remaining values (type rzy.,). We may notice, 
further, that all these significant residual correlations deal specially 
(according to their class designations, at any rate) with language, 
space, number, and memory. Moreover, they fall into four separate 
groups, each of which can be treated separately by (B) again, and 
will then produce a separate factor of its own.' Putting all the factors 
together and omitting zeros, we get the matrix as shown on page 13. 
From the factors in the preceding matrix we can proceed to recon- 
struct the original correlations given in Table I. The average dis- 
crepancy proves to be only .03. Such a fit, obtained by the old method 
in this exceedingly simple manner, would appear to be actually better 
than that which was attained by all the multitudinous calculations, 
rotations, and factorizations involved in the new method. 

After all such summary examination of the averaged correlations— 
those between different classes as also between different members of 





1 See repeatedly in The Abilities of Man, quoted in foot-note 1 on p. 10. 








Thurstone’s Work Re-worked 13 


the same class—there remain for our consideration the original indi- 
vidual correlations between single tests. 


FacTorRiAL Matrix 
(Zeros Omitted) 





Factors 
Tests 





G V S N M 





. 665 . 160 
.535 . 367 
. 566 .462 
.524 . 259 
.470 .570 
.557 . 207 
.681 
. 167 


.395 .397 
.508 .213 


ov mn oOQonrhwWN 


— 




















In general, these may be expected to have much less importance. 
For the fact of their effects not being noticeable in any of the averages 
would seem to indicate that their respective ranges are narrow and, 
moreover, that their weights are comparatively slight. 

From time to time, however, a scrutiny even of these may reveal 
correlations of great interest. For instance, a very brief inspection 
should be enough to detect the outstanding magnitude of the correla- 
tion (.71), between Verbal Classification (test 6) and Identical Forms 
(test 26). To any psychologist familiar with the previous investi- 
gation of factors, this correlation will recall one which had been 
detected by Holzinger, and attributed by him to the factor of ‘‘ mental 
speed.” Here is a suggestion easy enough to and up; on the whole, 
it would appear to be verified. 

3. Points of Agreement.—Let us now compare all these results 
obtained by the two factor method with those which we before found 
to have been reached by the method of rotating axes. 

As regards the group factors at any rate, there is after all a very 
large degree of agreement. For in both cases alike, the three leading 
factors are said to deal with the materials designated, respectively, 
as verbal, spatial, and numerical. In both cases, too, we also find a 
factor of memory and one of speed. 














14 The Journal of Educational Psychology 


On all these points, moreover, the results of both methods applied 
to the present research are in good accord with those previously 
obtained on many occasions elsewhere. ! 

And, in truth, such a concordance manifested by the new method 
with all other accredited studies was only to be expected. For, as we 
have seen, the new operation consisted essentially in scattering G 
among such numerous group factors, that the fragment assigned to 
each separately became too small to be noticeable. 

Up to this point then, the factorial results of Thurstone, although 
obtained by the new and disputable method of maximizing the number 
of zeros, admit, nevertheless, of being approximately legitimated by 
the long established method of two faciors.? 

4. Points of Disagreement.—In other respects, however, the agree- 
ment becomes less satisfactory. The new method generates many 
further factors which can claim little or no support anywhere else. 
As a case of this sort we may or may not regard the distinction which 
the method has produced between one factor of “‘ verbal relations,’’ (V) 
and another characterized as ‘‘fluency in dealing with words” (W). 
Much more definite becomes the break with all the results obtained 
elsewhere when we are presented with the novel factor of Induction (J), 
which has the disconcertingly vague function of ‘“‘finding a rule or 
principle for each item in the test.””’ Then arrive two more factors 
about which Thurstone himself concedes that their “interpretation 
is tentative and not so clear” (R and D). Finally come four yet 
further factors for which he does not attempt any psychological 
interpretation at all. 

The suggestion arises that few if any of the new unverified, and 
even unintelligible, factors are anything more than statistical artifacts. 

Graver still becomes the conflict of views when we turn from the 
group factors to the general one, or ‘“‘G.’”’ For by the denial of this, 
as we have seen, the whole scheme of factors is completely trans- 
formed. The distribution of weight jumps from extreme heaping and 
minimum scatter to extreme levelling and maximum scatter. 

The effect of this denial is enhanced, too, by its intransigence. 
Thurstone not only analyses the data in his own preferred manner; he 
makes little suggestion that other modes of analysis are also 





1 See repeatedly in The Abilities of Man, quoted in footnote on p. 10. 

2 Since these lines were written, a rigorous proof of this extensive agreement 
between the group factors arising from the new and the old methods, respectively, 
has been published by Holzinger and Harman, Psychometrika, 1938. 








Thurstone’s Work Re-worked 15 


possible, even at the same time. Again, he not only adopts his 
principle of maximizing the number of zero entries, but does so, appar- 
ently, without concern for any ensuing violation of the standard 
principle of minimizing the number of factors. 

As yet another fundamental difference between Thurstone’s 
analysis and ours, we may recall his claim to have arrived at seven 
“primary abilities” (V, N, S,M,P,W,andJ). The phrase is indeed 
arresting enough. But is it clear? In its fullest significance, it 
would seem to indicate that the remaining five factors are somehow 
derived from the said seven. But this is immediately precluded by 
the fact that all twelve are mutually independent. In its weakest 
significance, it might mean that the seven greatly exceed the five in 
weight or loading. But this, too, is at once contradicted; both alike 
average .18. 

In opposition to the claim of any such factors to be ‘‘ primary,” 
the theory of two factors would say that they really represent nothing 
more than those constituents of the tests with respect to which two and 
more of them happen to overlap. And all such overlapping (save 
in the case of G) depends altogether on which tests may happen to have 
been put together. By this and other considerations we are led to 
the view that group factors, far from constituting a small number of 
sharply cut “‘primary” abilities, are endless in number, indefinitely 
varying in scope, and even instable in existence. Any constituent of 
ability can become a group factor. Any can cease being so. Science 
can do no more than pick out those of them which happen to have or 
acquire sufficient breadth of scope to make their study worth while. 

All this comparison between the two methods of factor analysis— 
the one producing a G and the other eliminating it—may be summa- 
rized as follows: 

As regards general scientific status both alike create hypothetical 
factors which are then well verified by fitting the observed correlations. 
But in the case of the G method, the fit proves to be even better. 

As regards statistical principle, the non-G analysis rests on the 
inadequate rule of maximizing the number of zeros, whereas the G 
analysis prefer the established rule of minimizing the number of factors. 

As regards psychological support, the non-G analysis leaves many 
more mental observations out of account, and on the other hand 
introduces many more statistical values that are mentally unin- 
telligible. Furthermore, this analysis leads to an untenable revival 
of the ancient doctrine of ‘‘faculties.” 

















16 The Journal of Educational Psychology 


As regards, finally, the very practical matter of computation, the 
G method, far from being, as often asserted, much more laborious than 
the other one, claims to be much less so. 


CONCLUSION 


We have been presented with a research about which it is difficult 
to speak in terms that do not appear exaggerated. For interest of the 
scientific problem, for novelty of the method introduced to solve it, 
for comprehensive scale of the investigation to which this method is 
applied, as also, above all, for the authoritative status of the author, for 
such virtues it would be difficult to surpass the Primary Mental 
Abilities just published by Thurstone. 

And no unequal match to this scientific promise is the novelty of 
its realization. The results are the opposite to anything ever obtained 
in any previous researches, including those of the investigator himself. 
On the negative side there has been found a complete absence of any 
general factor. And on the positive side, there has been drawn up 
what claims to constitute a full array of ‘“‘primary abilities.”’ 

But, unfortunately, ‘‘the best laid schemes of mice and men gang 
aft a-gley.”’ Our scrutiny has led us up to the perturbing discovery, 
that the new method or version of multiple factor analysis has at any 
rate an Achilles heel. It is liable to go completely astray in a certain 
application of questionable principles to inadequate data. And just 
this unfortunate conjunction would appear to have here been realized. 

Nevertheless, the situation is not beyond at least partial remedy. 
The new experimental data, whatever difficulties they may encounter 
when treated by the new method, have been found readily amenable 
to the older one. And in this fashion the scientific harvest can, after 
all, be salvaged. 


at 
in 
ar 
an 


Siz 


in 


ati 








THE SELECTION OF UPPER AND LOWER GROUPS 
FOR THE VALIDATION OF TEST ITEMS 


TRUMAN L. KELLEY 


Harvard University 


It is not intended here to discuss the general problem of item 
validation, but only that aspect of it that arises when an upper and 
lower group are selected to serve as standard groups in the differentia- 
tion of test items. It is argued that the more indubitably it is known 
that the upper group is superior to the lower group, the more definitely 
can it be concluded that an item is valid by finding that the upper 
group is more successful in passing it than the lower group. If, in two 
situations, one in which the upper and lower groups are differentiated 
with high certainty and the other with little certainty, the proportion 
of passes (7.e., right answers) in the upper groups are equal and equally 
superior to the proportion of passes in the lower groups, we should 
believe that the item represented in the first situation is more valid 
than that of the second situation. 

Having available an initial group which is normally distributed 
with reference to a desired criterion, we set the problem of selecting 
upper and lower portions of this group which will be most efficient in 
the study of items, and their selection or rejection, The items in 
question are capable of two grades only, right or wrong. We further 
limit the issue by not here considering the interrelationship of items, 
a matter of first importance when the final test to be constructed is to 
contain more than one item. It is granted that the problem as set is 
too constricted to be ‘‘real,”’ but it is, nevertheless, believed that its 
solution is commonly pertinent to the handling of real item selection 
problems. 

The writer has stated! that twenty-seven per cent should be selected 
at each extreme to yield upper and lower groups which are most 
indubitably different with respect to the trait in question. This 
article does not alter that conclusion but does provide a more available 
and somewhat improved derivation. 

Let us be given graduated scores on a test or trait from a sample of 
size N. For simplicity we shall consider N to be even, so that we may 





1 Reported by Milton B. Jensen: The Objective Differentiation of Three Groups 
in Education: Teachers, Research Workers, and Administrators. Genetic Psychol- 
ogy Monographs, May 1928. There is an error in the derivation as reported, 
attributable to the present writer, not affecting the final outcome. 

17 














18 The Journal of Educational Psychology 


write 2m = N, and we shall further let it be an observed fact that the 
distribution of scores is normal. It is not difficult to prove, though we 
shall not stop to do so, that if 27 individuals are to be chosen, the best 
results will arise if we choose j at the bottom of the distribution as one 
group, and 7 at the top as the other. Let the scores of the N subjects, 
in order from high to low, as deviations from the mean, be 2a, 2s, 

. Lj, . . . Lm —Lm,... —2j,.. . —2Xy, —2Xa, and let the standard 
deviation of these scores be o. If 7 individuals at the top and j at 
the bottom constitute the upper and lower groups, the certainty with 
which these groups are differentiated is given by the difference between 
their means divided by the standard deviation of this difference. 
Accordingly, the statistical procedure suggested is to determine j so 
that this critical ratio isa maximum. This statement of the problem 
requires qualification for the following reason: 

The ordinary formula for the standard deviation of the mean of j 
measures is o;/+/7 — 1, where o; is the standard deviation of the dis- 
tribution of 7 measures. This, however, is not the standard deviation 
that concerns us, for we wish to know the variability in the mean, M,, 
not as consequent to a particular sampling of 7 individuals, but only as 
consequent to the chance errors in the scores of the particular 7 indi- 
viduals in our sample. A score, not chosen at random, but because it 
has a particulaf deviation, such as is the score 2a, it having been chosen 
because it was the largest, suffers from a systematic error as well as a 
chance error. The systematic error can be allowed for by regressing 
this score toward the mean by the factor 7, where r is the reliability 
coefficient for the measure in question and the group in question. 
Accordingly, the quantity (rz,) is a quantity having no systematic 
error. It is the estimated true score for individual a. It has a chance 
error. The standard deviation of such chance errors is o+/r — r? and, 
happily, this is likewise the chance error for all other regressed scores 
(ry), (rxz;), etc. Thus the measures that we shall deal with are 
(rta), (ra)... (rx;), yielding the mean (rM;) having a standard 
error given by o+/r — r2/+/j and the corresponding measures from 
the lower end of the distribution. With groups of 7 individuals drawn 
from the ends of the distribution, the two means are (rM;) and (—7rM,);), 
yielding a difference between means of 27M;, and the standard devia- 
tion of this difference is (1/20+/r — r?)/+/j. Dividing the difference 
between means by its standard deviation, we obtain as the function 


to be maximized: 








ov/r —r? 








SS _ eT 


— — — we ——~ 


~ © 1 ~ 


Selection of Groups for Test Item Validation 19 


Neglecting as trivial the case r = 0 when, of course, no selection of 
groups will be useful, we see that f is a maximum when the (_) term, 
which is merely a function of j, isa maximum. The mean of the tail 
of a unit normal distribution is z/g, in which z is the ordinate at the 
point of dichotomy and q is the proportion of cases in the tail, or 

= j/N. We thus have 


Z 
f = constant —- 
V4 
The first derivative of this function with respect to z, where z and —z 
are the points of division in a unit normal distribution giving the 
upper and lower groups, is readily obtained since in a unit normal 
distribution dz/dz = —zz and dq/dz = —z. The derivative is 


df = constant (4 ad o~ ame at) 
dx q 


- Zz 
= constant ——[{ -—z + z) 
FH 


The proportion g can vary between the limits .5 and .0. This deriva- 
tive = 0, when g = 0 or q = 2/2z, the first value corresponding to 
f =a minimum and the second to f = a maximum. With the aid 
of the writer’s new tables! of the normal probability functions, giving 
z and z to eight decimal places for four-place arguments in g, we find 
that g = z/2x when q = .2702678. 

This outcome is entirely logical in spite of its apparent violation of 
the principle that any statistical treatment involving the throwing 
away of some of the original data will be less efficient than one using 
all the available information. It is certainly true that if scores of the 
entire distribution were used at their proper values wherein z, > 2» > 

. >a; >... etce., then discarding any middle portion would 
constitute throwing away useful data. However, if upper and lower 
groups are formed and, after having been formed, no use of the fact 
that z. > 2 >... > 2; is made in evaluating the performance of a 
test item, we are not using the original data in its full significance. 
Under this particular procedure we will do better to omit, e.g., individ- 
ual m, from the upper group since we are unable to include him, and 
attach the minute importance to his performance that should be 


attached to it because of the smallness of the deviation of his criterion 
SCOFe, Xm. 





1 The Kelley Statistical Tables. New York: The Macmillan Co., 1938. 

















20 The Journal of Educational Psychology 


We, therefore, conclude that if no distinction is made among the 
members of our upper and lower groups separately, when studying the 
performance of items, that we should, in general, select the twenty- 
seven per cent highest on the criterion measure for the upper group 
and the twenty-seven per cent lowest for the lower group. 

Let the number of items in the criterion = ¢ and assume that some 
item, 7, not already in the criterion, is being examined. It is not 
infrequent in practice to examine the performance of an item by the 
study of groups differentiated upon the basis of their scores upon a 
total test including the item in question. Logically this procedure is 
undesirable, but if the item is, say, only one out of a hundred, the 
spuriously high validity gotten for the item may not be serious 
and the procedure undoubtedly conserves computational labor. We 
here assume that the item being studied is independent of the criterion. 
If the item being studied is of reliability r; and equal to the average 
reliability of the items yielding the criterion, the Spearman-Brown 
step-up formula gives its value if the reliability, r, of the criterion is 
known, for we have, 








— tr; 

~ t+(t— 1)r, 
or, 

T\ = ! 

‘t= ((—1)r 


If, for a sample of N drawn from a single grade level, the criterion 
has one hundred items and a reliability of .95, we have an excellent 
measure as judged by available psychological measures in education 
and psychology. In this instance 7; = .16. We may call .16 the 
average reliability of an item in an excellent test. A more usual 
situation would be represented by a criterion reliability of .80. In 
this case r; = .04. This is low, but such items are by no means useless, 
for one hundred such yield a test of reliability .80. 

Let us now examine the performance of upper and lower groups 
upon an item of reliability .04. A technique handling items of this 
low reliability will clearly handle with greater certainty items of 
higher reliability. If the item and the criterion are unequally reliable 
measures of the same thing, the correlation, r;,, of such an item with 
the criterion is given by 








Selection of Groups for Test Item Validation 21 
7? = Vriv 


which, for the case being considered, provides r;, = .18. 

Had we the frequencies in the cells of a normal bivariate correlation 
surface for a correlation of .18, and if both the criterion and the item 
measures were given in graduated amounts, we could immediately 
ascertain the proportion of right responses for upper and lower groups 
consisting of any designated percentages. Since, however, the item 
variable permits of only two responses, right or wrong, we are called 
upon to assume that for this variable we have a dichotomy forced 
upon a continuous variable—the continuous variable being the degree 
of comprehension or knowledge of the subjects tested. If this com- 
prehension of the item is below a certain point, the subject ‘‘fails,”’ 
while if above he “‘passes.”” The scatter diagram is then as shown 
herewith for an item of average difficulty, z.e., one for which the mean 
score for the group entire is fifty per cent right answers. The reader 
should think of items having a chance score of zero, not of true-false 
items having a chance score of fifty per cent mght. The number of 
classes in the criterion variable may be any number desired (here nine 


are chosen for convenience only), but the number in the item variable 
is necessarily two, as shown. 


ScorRE UPON CRITERION 


























—2.4;—-1.8}-—1.2;—.6 | .0 .6 | 1.2]1.8] 2.4 
+ 
Score or |.0055) .0177) .0476) .0896) . 1179) . 1084) .0697| .0312).0124) .5000 
upon right 
item 
vari- _ 
able or |.0124).0312).0697) .1084| . 1179) .0896) .0476) .0177|.0055) .5000 
wrong 
Sums .0179) .0489) . 1173) . 1980} .2358) . 1980) . 1173) .0489) .0179)1.0000 
Proportion 
of right .307 |.362 |.406 |.453 |.500 |.547 |.594 |.638 |.693 
responses 





























The criterion scale is in terms of the standard deviation as the unit 
and the frequencies in the cells are proportions of the total frequency. 
The scatter diagram shown is drawn from the table of Volumes of the 














22 The Journal of Educational Psychology 


Normal Bivariate Surface, given in Pearson’s Tables for Statisticians 
and Biometricians, Part II, for a correlation of .20, which is close 


enough to .18 for illustrative purposes. 


The ogive shown by the proportion of right responses, as one pro- 
ceeds from less able to more able groups, may be taken as typical, more 
reliable items showing, of course, a steeper ogive and less reliable items 
a flatter one. The following fourfolds represent the situations main- 
taining when different percentages constitute the extreme groups 








employed: 
EXTREME Groups CONSISTING OF 1.79 PER CENT OF THE CASES 
Score upon item variable Lower group Upper group 
+ .693 .307 
_ — 807 .693 











EXTREME Groups CONSISTING OF 6.68 PER CENT OF THE CASES 








Score upon item variable Lower group Upper group 
+ .652 .347 
_ .347 .652 











EXTREME Groups ConsISTING OF 18.41 PER CENT OF THE CASES 








Score upon item variable Lower group Upper group 
+ .637 . 363 
_ . 363 .637 











EXTREME Groups CONSISTING OF 38.21 PER CENT OF THE CASES 





Score upon item variable 


Lower group 


Upper group 





os 





.591 
.409 





.409 
.591 





EXTREME Groups CONSISTING OF 50.00 PER CENT OF THE CASES 








Score upon item variable Lower group Upper group 
+ . 564 .436 
— .436 . 564 

















Selection of Groups for Test Item Validation 23 


In the first of the preceding situations the difference in the pro- 
portions of right responses between the upper and lower groups, 
Pi — P2, is (.693 — .307) = .386, and the standard error of this propor- 

, Pid: , Pode . .693 X .307 , .307. KX 693 4.88 

™ wa.7* 3, * d 0179N * 0179N ~ vy “Pe 
accordingly obtain for the critical ratio .079./N. Similar critical 
ratios for the other four situations are, in order: .118+/N, .144-/N, 
.142,/N, and .129.\/N. We observe that there exists a maximal 
value for a situation intermediate between the second and the third. 
Locating this with some precision, we find that the critical ratio of the 
difference between the proportions of right responses of an upper and 
a lower group is a maximum when the proportion of cases in the 
sample included in each of these groups is .26185. 

That this proportion differs from .27027 is not surprising in view 
of the fact that the situations dealt with are very different. In the 
first situation we considered the differences between means upon an 
item variable normally distributed and capable of finely graduated 
scoring, while in the second we have an item which can be scored 
“right” or “‘wrong” only. That the optimal tail proportions differed 
by but .00842 enables the generalization that tail proportions of 
twenty-five to twenty-seven per cent of the total number in the 
sample will be serviceable quite generally in item analyses—the 
twenty-seven per cent being optimal when the item variable is scored 
in graduated amounts, or when the item reliability is very low, while 
the twenty-five per cent is preferable when the item is scored “‘right”’ 
or ‘‘wrong”’ and the item reliability is fairly high. Since it is always 
simple to discover the items of high reliability, test-makers are pri- 
marily interested in situations which will be best served by tail pro- 
portions of twenty-seven per cent. 

In the analysis made an item was examined for which the per- 
centage of passes for the group entire was fifty. An item yielding a 
small per cent of passes becomes most diagnostic for a less able group 
—a group of such ability, in fact, that there will be fifty per cent of 
passes upon the item. In view of this fact, the writer has not 
attempted to ascertain what proportions should be employed when 
lower and upper groups are established for the study of an item with 
which, say, twenty-five per cent of the group entire succeed. The 
proportions undoubtedly would not be twenty-seven per cent from 
the extremes. Whatever the answer, it scarcely would be adminis- 
tratively feasible to use it, for upper and lower groups cannot well be 




















24 The Journal of Educational Psychology 


reconstituted from item to item, and, whatever the answer, we are 
sure it would not be as satisfactory as one obtained by studying the 
performance of the item in a less able group. 

The general conclusions reached are: 

1. Items are best studied in connection with groups yielding 
fifty per cent right responses. 

2. Upper and lower groups consisting of twenty-seven per cent 
from the extremes of the criterion score distribution are optimal for 
the study of test items, provided the differences in criterion scores 
among the members of each group are not utilized. 

3. Modifications from this recommended procedure, due to some- 
what non-normal distributions and to item difficulties not well adapted 
to the group employed, are theoretically desirable, though we may 
believe that they are hardly feasible practically. 


> a- - ——_— a. a. 


)j ™- «4 rH Ah hUMthlhlUF 


J 








VALIDATION OF PERSONALITY TESTS BY 
OUTSTANDING CHARACTERISTICS OF PUPILS 


R. PINTNER AND G. FORLANO 


Teachers College, Columbia University 


There is no generally acceptable method for the validation of a 
psychological test. Intelligence tests have been correlated with 
school marks, educational achievement tests, teachers’ ratings of 
intelligence, and similar standards. In the field of personality testing, 
the more or less objective standards of scholastic success cannot be 
used, and so we are compelled to depend upon other less satisfactory 
criteria. Case studies made by competent psychologists may be 
useful, but there are not many of these available. The diagnosis of 
the psychiatrist is rarely of help, because he is generally only concerned 
with very maladjusted cases and also because his conception of per- 
sonality traits is not likely to coincide with that of the psychologist. 
So we have to resort to the classroom teacher for help in spite of the 
well-known unreliability of his ratings. This unreliability is partly 
due to the fact that an ordinary classroom teacher rarely knows the 
personality characteristics of all the children in his room, and partly 
due to the difficulty of clearly defining the personality trait which is 
to be rated. We generally put before him a rating scale with many 
personality traits, more or less vaguely described, and ask him to rate 
all the children in his class. In the present study, we have tried a 
different approach in order, if possible, to avoid some of the error 
inherent in the method which demands a rating of all pupils on specified 
traits. In contradistinction to this we asked the teacher to describe 
any outstanding personality characteristic of his pupils. Part of 
our request to him read as follows:* 


Please make a list of the pupils in your class and put down after each the 
outstanding personality characteristics. This should include good points as 
well as bad points. Personality characteristics will show themselves in various 
modes of behavior, so do not hesitate to mention behavior characteristics as 
well as more personal qualities. You might, for instance, make comments 
such as these: ‘‘ Extremely shy,” or “very nervous,” or “always interfering 


with his neighbors and whispering,” or “‘smart aleck,’’ or ‘‘likes to show off 
in class,’’ and so forth. 





1 We wish to acknowledge the help rendered by Assistant Superintendent 
John J. Loftus, Board of Education, New York City, in this part of the investigation. 


25 











26 The Journal of Educational Psychology 


The emphasis here was upon any striking characteristic that the 
child might show. No clue was given as to the personality traits 
in which we were interested. The teachers responded in various ways; 
some by a single word for each child; others by two or three phrases 
for each child. A few children were not characterized at all. Samples 
of the one-word-response description for a child are: (1) Pleasant; (2) 
very quiet; (3) agreeable. More frequently two or three words were 
used to characterize a child, e.g., (1) nervous; shy with elders; (2) old 
womanish; unselfish; eager to please. Longer descriptions were rare, 
eg. ‘Bright, quick, show-off. Loves attention focused on him con- 
stantly. Never keeps still a minute.” In this way we gathered 
descriptive comments on twelve classes in grades IV and V including 
dull, normal, and bright sections and comprising in all about three 
hundred forty-four boys and two hundred thirty-eight girls. Only 
subjects whose test blanks were completely answered were included. 
For all these cases we had scores on the Aspects of Personality Inven- 
tory, which attempts to measure ascendance-submission, extro- 
version-introversion, and emotional stability. The highest possible 
score on each of these parts of the Aspects of Personality Test is 
thirty-five. 

The first step in dealing with the teachers’ lists was to go over the 
descriptive words and attempt to combine them into general categories. 
All the lists were thus gone over by three workers independently. 
Afterwards the three workers met together and consolidated their 
independent judgments into a final grouping upon which all could 
agree. This technique resulted in a master list of sixteen major cate- 
gories under which all the descriptive words of the teachers had been 
subsumed, that is, all except a few descriptive words which did not 
refer to any personality characteristic. Some of these categories 
were: (1) Aggressive; (2) domineering; (3) irresponsible; (4) unstable; 
(5) retiring; (6) seeks attention; (7) uncodperative; (8) agreeable; and 
the like. Under each one of these major categories we had the list of 
descriptive words used by the teachers. For example; under “domi- 
neering’”’ we find these descriptive words: ‘‘Interferes with others”’; 
“strong determination’’; “likes his own way’’; ‘“‘bossy”’; “‘nervy’’; 
‘‘selfish”’; ‘‘slightly imperious”’; and the like. 

The next step was to calculate averages of the scores on the 
Aspects of Personality Test for all the cases coming under each of the 





1Pintner, R., Loftus, J., Forlano, G., and Alster, B.: Aspects of Personality. 
Test and Manual. World Book Company, Yonkers, New York, 1938. 


81: 
ti 
co 
ca 
ca 


asct 
enc 
smé 


The 








Validation of Personality Tests 27 


sixteen categories for boys and girls separately. If a child’s descrip- 
tion fell into more than one category, his scores were tabulated in all 
corresponding categories. Some of the categories contained very few 
cases and could not, therefore, be used. We have chosen eleven 
categories for investigation: 

















Number of pupils 
Category Trait 
Boys Girls 
I es on es teen eae ee eran eked 10 25 
II Se Cle Laas we alge a adore oe 31 26 
III ee ais che edad kad Ges eRe ROT 4 20 22 
IV REET ae GER pea ee ra een ee 49 32 
V NR et aha Sian seein ds babe baa vane eed 99 77 
VI nd ee oe pees hen ies Laka 52 29 
VIII sé eeasdu had 46604048 Oh has ted ORS 41 27 
IX SE ee ne ee ee a if) 23 
X ds on koh oe new ew'se G0 Kero ne's 17 8 
XI Cc eou Cece eb egdckeehiweseabaes 43 26 
XVI EEE OE OE 39 15 








For these eleven categories the norm for each part of the test was 
subtracted from the respective mean of each category and standard 
ratios were computed. These mean differences and standard ratios 
are all shown in Table I for the eleven categories and for boys and girls 
separately. Table I is read as follows: The mean difference obtained 
by subtracting the norm of the A-S part, 7.e., the Ascendance-Sub- 
mission section of the test from the mean score on this A-S part of the 
test by the boys in Category I is .33. The standard ratio of this mean 
difference is .22. The other parts of the test—Z-J (Extroversion- 
Introversion) and E (Emotionality)—are to be read in the same 
manner. 

Category I includes all children described as aggressive in some 
manner or other. There are only ten boys and twenty-five girls. 
For both boys and girls we note higher means for A-S scores and 
for E-I scores. Children described as ‘‘aggressive” tend to be more 
ascendant and more extraverted. The trend is clear, but the differ- 
ences are not statistically significant. ‘The number of cases is very 
small. 

In Category II—‘‘domineering’”—there are thirty-one boys. 
Their mean score on the Ascendance-Submission part of the per- 











28 


The Journal of Educational Psychology 


sonality test was 18.66. This score is 1.29 above the norm for boys 


on this test. 


These boys, therefore, who are characterized by their 


teachers as ‘‘domineering”’ obtain higher scores, 7.e. are more ascend- 


ant, than boys in general. 


Their scores tend to go in the expected 


TaBLE [.—RELIABILITY OF MEAN DIFFERENCES (MEAN CaTEGORY ScoRE-NoRM) 


on SECTIONS OF THE ASPECTS OF PERSONALITY TEST FOR ELEVEN CATEGORIES 


GIVEN SEPARATELY FOR EAcu SEx 











Boys Girls 
Cate- 

gory A-S | E-l | E | A-S| E-I | E 
I | Mean difference............ .33 .40|}— .53}) 1.00 .69 24 
Standard ratio'............ .22 .36|— .32) 1.17 .87 .19 
II | Mean difference............}| 1.29 .97}— .99 .90 . 64 .98 
Standard ratio............. 1.544 1.59|— .74| 1.08 .86 .79 
III | Mean difference............ .83}— .05|)—1.97 13 .45) —1.35 
Standard ratio............. .94;— .05|—1.24 13 .47)—1.11 
IV | Mean difference............ .62)— .66) —1.37 .90}— .52)—1.85 
Standard ratio............. .88} —1.11)—1.23} 1.21;— .69)—1.53 
V | Mean difference............|— .73}— .58] .34/— .71/-— .29] .60 
Standard ratio.......... —1.55) —1.30 .51)/-—1.26;— .63 71 
VI | Mean difference............/— .01 .57|— .22;— .84 .05}— .14 
Standard ratio............. — .Ol; 1.05)— .25)—1.30 .O8i— .13 
VIII | Mean difference............}—1.38;}— .46/— .19)-—1.34)-— .44;-— .55 
Standard ratio.... ..1|-1.87)/— .86)— .17/—-1.46);— .59|/— .46 
IX | Mean difference............ .13) —2.13 .47|— .03 .45 .78 
Standard ratio....... .09) —1.34 .32|— .03 .64 .62 
X | Mean difference............ .42} 1.08 .05| —1.10) —3.05)— .57 
Standard ratio............. .40) 1.26 .05|— .70|—2.07;— .38 
XI | Mean difference............ 1.18} 1.06) 1.73) 1.25) 1.15 .64 
Standard ratio............. 1.69} 1.67) 2.54) 1.22) 1.94 65 
XVI | Mean difference............ .72 .35| 1.03)}— .07 .o2 .29 
Standard ratio...... ere. 1.08 .52;} 1.17;— .06 .50 17 


























1 The Standard Ratio is the ratio of the mean difference to the standard error 
of the difference. ‘4 


dir 
is - 
slis 


no! 
en 
als 
Pe 
No 
sor 


lov 
lov 
for 


diff 
par 
sco 
the 
sco 
tio! 
E-! 


psy 


tes 
the 


ing 
tra: 
dev 
the 


cha 
finc 
lov: 
will 
The 
abli 








Validation of Personality Tests 29 


direction, although the difference is not highly reliable, z.e. the ratio 
is 1.54. The next column tells us that these ‘‘domineering”’ boys are 
slightly more extraverted than normal with a standard ratio of 
1.59. And the last column tells us that they fall slightly below the 
norm in emotional stability, but the ratio of —.74 shows this differ- 
ence to be very unreliable. The twenty-six ‘“‘domineering”’ girls 
also show higher scores in the A-S and E-IJ sections of the Aspects of 
Personality Test and practically the same score on the E section. 
None of these differences are large, but the significant trait compari- 
sons are at least in the right direction. 

Category III is “‘irresponsible.’”’ Here the striking feature is the 
lower score on the £E trait, denoting emotional instability, for both 
boys and girls. Children described as “‘irresponsible”’ tend to score 
lower on emotional stability, but show no deviation from the norms 
for Ascendance-Submission or Extraversion-Introversion. 

Category IV is “‘unstable.’”” Here we should expect the greatest 
difference between the means to occur on the E or emotional stability 
part of the personality test and we should expect lower scores, 2.e., 
scores showing more emotional instability. For both boys and girls 
the largest standard ratios do occur in this E section and the mean 
scores are below the norm. These cases then test slightly less emo- 
tionally stable. Furthermore, both boys and girls score lower on the 
E-I section of the test, z.e., they test more introverted. 

Category V is “retiring.” This may be similar to what the 
psychologist labels ‘‘submission,’”’ or what he calls ‘“‘introversion.” 
We should expect lower scores on both the A-S and E-I parts of the 
test. This is true for both boys and girls. There is less difference on 
the E part, according to expectation. 

Under Category VI are those children who are described as ‘‘seek- 
ing attention.” The boys of this group deviate most on the E-I 
trait, being somewhat more extraverted than the norm. The girls 
deviate most on the A-S test, being somewhat less ascendant than 
the norm. There is no definite trend here. 

Category VIII is called ‘‘agreeable.” This is a neutral kind of 
characteristic. Referring to our master list of descriptive words, we 
find the teachers’ descriptions contain such things as “cheerful, fun- 
loving; good-natured; kindly; sympathetic; patient; sweet-natured; 
willing; anxious to please; amenable; nice; obedient; polite”’; etc., etc. 
These descriptions seem on the whole to point to the obedient, tract- 
able child who conforms to the teachers’ rules and regulations. We 








30 The Journal of Educational Psychology 


might expect, therefore, on our tests a tendency toward submission 
as the most prominent characteristic, and, examining the table, we 
find that both boys and girls show the most reliable difference in this 
direction, with negligible differences on the other two traits. 

Category IX—‘‘codéperative’”—has only nine boys and twenty- 
three girls. The most noticeable difference for the boys is in the 
direction of introversion. The girls show no differences in any of 
the three traits. 

Category X contains very few cases, only seventeen boys and 
eight girls. This is called ‘“‘socially unadjusted,’ and contains such 
descriptions as “‘self-centered; non-conformer; always interfering; 
unsocial; always complaining; disobedient; unfriendly’’; etc., etc. 
It is difficult to bring this in line with any of the three traits measured 
by the test. An examination of the scores shows no general tendency. 
The boys test slightly more extraverted, while the girls much more 
introverted. The category is probably too mixed and the number of 
cases much too small to lead to any useful conclusion. 

Category XI—‘‘intelligent’”—shows a consistent definite trend 
for both boys and girls to score higher in all three traits and, therefore, 
show themselves slightly more ascendant, more extraverted, and more 
emotionally stable. This does not mean that there is a positive 
correlation between these traits and intelligence as measured objec- 
tively by reliable intelligence tests. We are not here concerned with 
that. What we have here seems to point to the tendency for ascendant 
people as contrasted with submissive people to be judged as more 
intelligent, other things being equal. Similarly, extraverts as opposed 
to introverts, and emotionally stable as opposed to emotionally 
unstable individuals will appear more intelligent to the teacher. 

Category XII contains those cases who were assigned no descriptive 
traits by their teachers. Here may be included the colorless children 
who make no impression one way or another upon their teachers. 
Again, new children who have just entered the class will be included 
here. The boys are somewhat more ascendant, but not the girls. 
The test results as a whole show no consistent tendency for both boys 
and girls. 


SUMMARY AND DISCUSSION 


The Aspects of Personality Inventory was given to five hundred 
eighty-two children in several classes. The teachers were asked to 
write down the most outstanding personality characteristics of the 


Oo = OO = © &€- &©» #4 Aa saInfm A AH Se A 


S&S Oe cf iO Oe 


= 








Validation of Personality Tests 31 


children in their classes. These descriptions were then grouped into 
major categories. The scores of the children falling into these various 
categories were then compared with the norms for the test. The 
differences between the means were in general rather small, but the 
general trend of the differences seemed to be in the expected direction. 
This technique may be considered as affording a rough measure of the 
validity of the test. 

It will be remembered that the teachers were asked to describe any 
outstanding personality characteristic of each pupil. This procedure is 
an indirect approach in the application of the rating technique and is 
the reverse of the usual procedure, which requires the teacher to rate 
each child on a series of personality traits listed for the teacher by the 
psychologist. Obviously, before the teacher is asked to describe the 
outstanding personality traits of a pupil there should elapse a proper 
period of time in order that the teacher may become fairly well 
acquainted with the child. 

This indirect approach might very well be used as the initial step 
in the construction of a personality inventory, the subtests of which wili 
be concerned with those traits or behavior trends that teachers can 
observe easily and report accurately. Therefore, before launching 
upon the construction of tests or inventories seeking, to measure 
personality traits or emotional attitudes, which though important are 
difficult of analysis and of determination and establishment of validity, 
attention might very well be concentrated on the construction of 
techniques designed for the measurement of those personality traits 
more amenable to observation and follow-up. 

In the present study it was reported that the various descriptive 
words which served to picture outstanding behavior trends of each 
child were subsumed under sixteen categories. The next step would 
be to select those personality traits which are deemed important for 
further analysis and concerning which there is some agreement. 
Several standards of judging which traits are important may be 
employed. From one point of view the personality traits which 
might be selected for further work might very well be those which 
are concerned chiefly with the individual’s proper adjustment to his 
social milieu. 

Suppose the trait or behavior trend chosen for further study 
is that of domineering-submissive behavior. The ensuing procedure 
would involve the selection of items purporting to describe the various 
aspects of this behavior and the usual situations wherein a person may 














32 The Journal of Educational Psychology 


display domineering or submissive conduct, attitudes and interests. 
Those children whose outstanding and stable characteristic is judged 
to be domineering and the group of children judged to be submissive 
would constitute criterion groups to be used in the selection of signifi- 
cant, valid, and reliable items for the test purporting to measure this 
personality trait. 


c 
d 
‘T 
J 
g 
T 
Pp 
h 
R 








A NEW MEANING FOR EDUCATIONAL PSYCHOLOGY’! 
PERCIVAL M. SYMONDS 


Teachers College, Columbia University 


The establishment of the American Association for Applied Psy- 
chology is the signal for giving educational psychology a slightly 
different orientation from that which it has commonly had in the past. 
The term “educational psychology” according to former Dean 
James E. Russell of Teachers College was coined when Thorndike was 
given a faculty appointment at Teachers College, and certainly 
Thorndike’s name has been associated continuously with educational 
psychology since the beginning of the century. However, psychology 
had been applied to educational problems long before Thorndike’s day. 
Rousseau was an educational psychologist and one who is still modern 
in many respects. Herbart was an educational psychologist in the 
period when formal discipline was rampant, although his psychology 
has many features of solid worth beside the now discarded concept 
of formal discipline. William James certainly won a place for himself 
as an educational psychologist with his J’alks to Teachers on Psy- 
chology and to Students on Some of Life’s Ideals. G. Stanley Hill, 
likewise, with his two volumes on Adolescence must be counted as an 
educational psychologist. 

When Thorndike came on the scene he contributed to an expiring 
movement, then known as Child Study, and his little known volume 
on The Human Nature Club is addressed to that group. This early 
book contains the germ of his later development and exposition of 
Instinct, Learning, Fatigue and Individual Differences, topics which 
have been the backbone of educational psychology during all the 
intervening years. 

Thorndike’s educational psychology, however, has been added to 
and modified by two new movements almost simultaneously about 
1926. One was precipitated by Goodwin Watson in an article in the 
JOURNAL OF EDUCATIONAL PsycHoLoGy for December, 1926, entitled 
‘““What Should Be Taught in Educational Psychology?”’, in which he 
pointed out the neglect of certain important topics; namely, those 
dealing with personality, adjustment, emotion, mental hygiene, and 





1 Address of the retiring vice-president of the Educational Section of the 
American Association for Applied Psychology read at Columbus, Ohio, on Septem- 
ber 5, 1938. 


33 








34 The Journal of Educational Psychology 


problems of home-making and preschool training, which might pro- 
vide teachers with a more profound understanding of their pupils as 
individuals, rather than how children learn certain isolated bits of 
subject-matter. The Gestalt movement coming from Germany 
penetrated educational psychology in this country at about the same 
time (1926) in a book by R. M. Ogden, Psychology and Education, and 
later (1932) by R. H. Wheeler, Principles of Mental Development, a 
Textbook in Educational Psychology. 

Originally educational psychology was, perhaps, a philosophy of 
education with special emphasis on the psychological analysis of the 
learnings sought in education. Dewey, therefore, may be thought 
of as one of the most eminent of educational psychologists. Thorn- 
dike, Terman, Gates, Freeman, Pintner and others, however, have 
attempted to build a scientific educational psychology, providing 
education with facts and laws which might serve as the basis of a 
definite technology. Tests have found a central place in the scientific 
movement as the measuring devices on which a scientific psychology 
must be based. 

The Binet intellgence test became the special property of clinical 
psychologists from the very start, and it was not until the movement 
for the construction of school achievement tests and the school survey 
movement was under way that tests became the concern of educational 
psychology. Then when group mentai tests were introduced by Otis 
and the army psychologists, testing almost engulfed educational 
psychology. Educational psychology has been considered to be 
essential for the classroom teacher in giving him insight into the 
operations of the learning process and in helping him to understand 
his pupils better. 

The title of the new American Association for Applied Psychology 
gives the cue for a new meaning for educational psychology. We are 
now forming a professional association of educational psychologists 
who are expected to make application of psychology to educational 
problems. In the past we have had teachers of educational psychology 
and research workers adding to the facts of educational psychology. 
Now we have educational psychologists—persons expected to apply 
this body of scientific data to educational situations. 

The point of view, then, has shifted from educational psychology 
as a science to educational psychology as an art and a technology. 
The educational psychologist is the new craftsman or technologist 
applying psychology to education. One may well ask how this new 


oan tt _ aie - 


ne 
or 


be 
pre 








A New Meaning for Educational Psychology 35 


professional group differentiates itself from teachers. I am not sure 
that the distinction is clear. Certainly, every teacher ought to be an 
applied psychologist and a technologist applying psychological 
principles to the task of teaching, and with better teacher-training and 
selection in the future, the goal of having every teacher an applied 
psychologist may be reached. \At present, however, the educational 
psychologist is one who by virtue of his deeper psychological insight 
and more extensive psychological training is able to give expert 
assistance in the education of those deviates who are mentally retarded 
or accelerated, who have special difficulties in school subjects such as 
reading or arithmetic, who are physically handicapped, or who show 
behavior, personality, social, or emotional problems. 

The educational psychologist, however, as a counselor can only in 
rare cases be thought of as a teacher, or the teacher as a counselor. 
A counselor, to be effective, must have only the most permissive rela- 
tions with those whom he counsels. He should be divested of every 
shred of authority which might serve as a threat and barrier to those 
whom he is counseling. The educational psychologist cannot be an 
educator in that he does not maintain standards or adopt an ethical or 
normative point of view, but is concerned only with enabling a person 
to gain inner control and readjustment. The teacher, of course, as a 
representative of society, must represent certain standards of thought, 
behavior, and achievement. So the educational psychologist as a 
counselor occupies a unique position, clearly differentiated from that 
of the teacher. 

Since about 1920 psychologists have given their services to schools 
in this special capacity under the title of school psychologists. Be- 
cause of their training and special skills, the school psychologist is 
known among educators mainly as a person who is responsible for 
the testing program of the school. This is an unfortunately limited 
meaning that has gathered around the term school psychologist, for 
certainly the psychologist who is to serve a school shall be equipped 
not only to administer and interpret tests, but to understand any 
needs that a child may have. Likewise in colleges, personnel work 
has been undertaken by persons serving under a variety of titles with 
more or less psychological training. It is obvious, however, that the 
need for specialists in psychology on the college level is as great as, 
or even greater than on the preceding levels of the educational ladder 
because of the continued complexity and seriousness of personal 
problems and the problems of learning on the college level. 











36 The Journal of Educational Psychology 


Since most of the work of the educational psychologist will be with 
individual children, the kind of person whom I have been describing 
has been called by some a clinical psychologist. If by a clinical 
psychologist is meant one who studies individuals psychologically, 
this is a correct appellation, and should be the cause for no concern. 
- Perhaps an educational psychologist can render his most effective 
service in his clinical work. Even if an educational psychologist 
devotes the major portion of his time to clinical work, there are still 
good reasons why there should be a separate section for educational 
psychology in this organization. Unlike the clinical psychologist in 
other institutions who frequently must work alongside of, or even 
under, a psychiatrist, the educational psychologist most frequently 
works more or less independently, associated with teachers and 
directly responsible to the principal or superintendent of schools as his 
superior officer. Because his work is concerned with the learning 
problems of children or of older students, or with the personal adjust- 
ments of pupils, special emphases must be given in his training which 
would be different from the training of the clinical psychologist who 
is preparing for work in other fields. The educational psychologist, 
for instance, should have special emphasis in his psychology to the 
problems which most vitally concern education. He needs funda- 
mental courses in child development, in the psychology of exceptional 
children, and in the psychology of adjustment. He needs to know the 
special methods for diagnosing learning difficulties in different sub- 
jects. Of peculiar importance is his understanding of principles of 
reeducation, guidance, and psychotherapy. But remembering that 
with our new meaning the educational psychologist is more of a 
technician or artist than in the past when knowledge of the subject 
was emphasized, wide opportunities should be given during the 
training period for supervised experience in the arts of diagnosis and 
treatment. 

Finally, the educational psychologist should be able to handle his 
own personal needs and conflicts, at least to the extent that they do 
not interfere in his relations to those whom he counsels. In my judg- 
ment this ought to be a requirement of every clinical psychologist, 
but is especially necessary for the educational psychologist. A person 
who wishes to become an educational psychologist may need special 
treatment in order to insure that his relationships will be an asset 
and not a handicap in his work. 








A New Meaning for Educational Psychology 37 


With the formation of the American Association for Applied 
Psychology we hail the recognition of a group of persons to serve 
education who are equipped with technical psychological knowledge 
on the one hand, and technical skill and proficiency on the other. 
This Association should appeal to those educational psychologists who 
see their main opportunity for service not in teaching educational 
psychology in the classroom, in writing texts and exercise books in 
educational psychology, nor in the construction of new tests, but in the 
application of these principles and procedure to actual educational 
situations. We must recognize that the scientific contributions of 
educational psychology in the past have been directed toward a better 
understanding of the more minute problems of learning. Within 
recent years evidence has accumulated that success with school 
learning is bound up with the satisfaction of needs of the total per- 
sonality. It is hoped, therefore, that educational psychologists will 
not confine themselves exclusively to the minutiae of specific learnings, 
but will devote attention to the problems of personality adjustment 
in the larger sense, with the promise that in this direction lies the 
possibility of making the greatest contribution to human well-being 
and happiness. 











AN EXPERIMENTAL ANALYSIS OF THE ALLEGED 
CRITERIA OF INSIGHT LEARNING 


LOUIS A. PECHSTEIN AND FORREST D. BROWN 


University of Cincinnati 


The primary function of the school is that of providing an environ- 
ment in which learning may take place in the most efficient manner. 
Both the materials and the methods of education are conditioned by 
the nature of this learning process. It follows that the most funda- 
mental question to be answered by educators is this: What are the 
essential characteristics of learning behavior? In reply to this ques- 
tion there are varied and contradictory descriptions. Two con- 
temporary schools of psychology* have contributed much in way of 
laboratory experimentation and theoretical interpretation of learning 
behavior. However, these two accounts apparently are in radical 
disagreement. The one describes learning in terms of trial-and-error, 
while the other denies that learning takes place in this manner and 
advances the theory that all learning is to be described in terms of 
insight. 

The first theory, that of learning by trial-and-error, was first 
presented fully by Thorndike in his epochal experiments on animal 
intelligence.4 According to this account, all learning proceeds by 
trial-and-error activity during which the learner tries out all the 
reactions which make up the repertoire of his congenital and learned 
responses. The learner gradually varies his responses to the situation 
until the correct response is made. This successful act first comes by 
chance insofar as the time of its initial appearance is unpredictable. 
The acquisition of this correct response in a complex learning problem 
is gradual—a gradual process of eliminating errors and acquiring 
correct responses. 

The second theory, that of learning by insight, had its origin 
in German psychological experimentation. According to Koffka,! 
Kohler,? Wheeler,*’ Ogden,* and others of the Gestalt school, learn- 
ing proceeds by sudden adaptations, or insights, rather than by 
trial-and-error. 





*The writers take the view that the theory of conditioning, advanced by 
Pavlov and Bechterew, is a physiological account of learning. Pavlov has insisted 
that he is a physiologist, not a psychologist. Thorndike has recently pointed out 
some fundamental reasons why this theory cannot be taken as the basic account 
of all learning behavior. (5: 107-110) 

38 


le 
pl 
Ww 
at 


fir 


de 


an 
suc 


of t 
of | 








Analysis of the Alleged Criteria of Insight Learning 39 


The general purpose of this study is to analyze experimentally the 
criteria which have been advanced in description of insight learning. * 
Before stating the problem more specifically it will be helpful to sum- 
marize the most important of these criteria. 


THE CRITERIA OF INSIGHT LEARNING 


Immediate Solution—One of the criteria descriptive of insight 
learning has to do with the time interval involved in the learning 
process. According to this theory, if the solution of the problem is 
within the capacity of the subject, learning will take place immedi- 
ately. Wheeler gives this as one of the criteria of insight: 


Consistently responding to novel aspects of situations correctly for the 
first time. (6: 519) 


Yerkes states this criterion in the following manner: 


Insight is used throughout this report to designate varieties of experience 
which in us are accompaniments of sudden, effective, individually wrought 


adaptations to more or less distinctly new and problematic situations. 
(8: 155) 


Sudden Solution.—When the situation is complex enough that the 
subject cannot solve it immediately, the insight theory holds that the 
adaptive act will appear suddenly. Wheeler states this criterion as, 


Sudden formation of configurational responses. Rapid rises or falls in 
the learning curve, depending upon the fashion in which the curve is plotted; 
sudden solving of problems when responses were previously controlled by 
chance distribution of conditioning factors; sudden elimination of long routes 
to the goal; sudden abondoning of wasteful procedures. (6: 519-520) 


Yerkes states this criterion in similar terms: 


Appearance of critical point at which the organism suddenly, directly, and 
definitely performs required adaptive act. (9: 156) 


Koffka likewise says, 


To be sure, a true solution often follows after a perplexed period of trial- 
and-error; but in this case the difference is even more striking, for the animal 
suddenly gives a start, stops a moment, and then proceeds with a single 
impulse in a new direction to the attainment of the goal. (1: 181-182) 





* For a full report of this study see Forrest D. Brown: An Experimental Analysis 


of the Alleged Criteria of Insight Learning. Unpublished Doctor’s thesis, University 
of Cincinnati, 1933. 











40 The Journal of Educational Psychology 


Response to the Situation as a Whole.—The insight theory holds 
that the learner sees and responds from the beginning to the situation 
es a whole. Wheeler sets forth this as a criterion of learning by 
insight: 


The perception of a goal in its relation to a total stimulus-pattern and self 
propagation toward it. (6: 519) 


Ogden also says, 


Learning as here described is not a process of accretion... It is a 
pattern and not a content which analysis reveals in the destruction of a previous 
whole, and in the creation of a new, though subordinate, whole. Likewise, a 
synthesis of two or more wholes is effected pattern-wise and not by mere 
accretion. (3: 251) 


Response to the Meaningful Relationships in a Situation—The 
advocates of insight consider the response to relations between the 
elements in a situation as a criterion of insight. Wheeler states this 
criterion as, 


Responding to the constant but abstract features of changing stimulus- 
pattern; learning the relatively brightest light of a changing combination; 
“transfer” effects. (6: 519-520) 


Koffka, interpreting Kohler’s work, says, 


Thus Kohler contends that the manipulation of things with reference to 
their important material relations can be employed as a criterion of behavior 
with insight—that is to say, of intelligence. (1: 217) 


Evidence of Mental Activity—Those describing learning in terms 
of insight contend that if there is evidence of mental activity preceding 
the overt adaptive act, that is indicative of insight. Yerkes lists 
this as one of the criteria, 


Survey, inspection, or persistent examination of problematic situation; 
hesitation, pause, attitude of concentrated attention. (9: 156) 


Wheeler also says, 


Periods of initial delay prior to the execution of a new performance, that is, 
hesitating while studying a novel situation. That is presumably a symptom 
that the configuration is forming. (6: 520) 


Absence of Random Activity and Chance as Primary Factors.—To 
the above criteria should be added two additional ways in which 
insight learning is identified and is described as being opposed to 


ins 
th 
th 


qu 
ex) 


ele 








Analysis of the Alleged Criteria of Insight Learning 41 


learning by trial-and-error. First, the insight theory has no place for 
random activity in the learning process. Wheeler says, 


A random, or trial-and-error method, of attacking any problem, is unnec- 
essary and wasteful. When an animal or human being adopts such a method 
it is obvious that the problem is too difficult to fit the subject’s level of insight. 
A new situation should be met successfully the first time. (7: 115) 


Secondly, the proponents of learning by insight hold that, although 
chance and insight are not opposed to one another—for insight fre- 
quently comes by means of chance—insight is prerequisite and funda- 
mental. Koffka, presenting this point of view, says, 


Instead of the solution arising first by chance, and thereafter becoming 
more or less “‘understood,’’ understanding, or an appropriate transformation of 
the field, precedes the objective solution. (1: 205) 


PROBLEMS FOR INVESTIGATION 


As a basis for experimental investigation, the hypothesis is pro- 
posed that these criteria commonly used to identify learning by 
insight hold only when the learner is able to transfer to the situation 
a solution reached during an earlier period of trial-and-error learning. 
In other words, it is suggested that all instances of alleged insight 
learning are merely examples of trial-and-error plus the transfer of 
past experience. 

The specific questions to which this study is directed are: (1) Will 
learning take place either immediately or suddenly when the subject 
lacks past experience with the essential elements in the situation? 
(2) Does the subject ever react to the relations in a situation or to the 
situation as a whole except on the basis of associations of these ele- 
ments which have been previously learned by trial-and-error and are 
now transferred to the situation? (3) What is the rédle which chance 
(resulting from random activity) plays in the learning process? Is 
insight as a mode of learning prerequisite to chance or is insight merely 
the end product (the learned act) resulting from chance success? 
(4) Is there any essential characteristic of learning behavior for which 
the insight hypothesis accounts and which cannot be described ade- 
quately by the trial-and-error theory plus the transfer of past 
experience? 


EXPERIMENTAL PROCEDURE 


The learning situations were designed so that all the essential 
elements in the problems were visible from one point. There were 








42 The Journal of Educational Psychology 


no hidden mechanisms. It was possible to solve the problems men- 
tally. Some of the problems formed a sequence, graduated in com- 
plexity, so that transfer of experience was possible. 

The first subject was a female gorilla, Susie, approximately six 
years of age. The second subject was a male chimpanzee, Romeo, 
about four and one-half years of age. Eight children, five boys and 
three girls, ranging from two years six months to seven years three 
months in age, comprised a final group of subjects. 

The first series of experiments were a duplication of certain food 
and stick problems previously used by Kohler? experimenting with 
chimpanzees and by Yerkes*® studying the gorilla, Congo. The 
second series of learning situations consisted of a puzzle-box locked 
by means of bars arranged in various ways. The children served as 
subjects only on this last series of problems. 

Single-stick Problem.—The first food and stick problem consisted 
of food outside the cage placed beyond the reach of the animal. In 
the case of the gorilla this food was placed on a table which was bolted 
to the bars of the cage. For the chimpanzee the food was put on the 
floor beyond the bars. A stick, two feet long, was placed half in and 
half out of the cage, and pointing toward the food. During each 
setting of the problem the animal was kept by its keeper at the far 
end of the cage. The keeper and experimenter remained outside 
the cage. 

The criterion of learning was arbitrarily set as four successive 
solutions with no errors, that is, no irrelevant movements. This 
would correspond to Kohler’s description of intelligent behavior as 
that ‘‘which takes account from the beginning of the lie of the land, 
and proceeds to deal with it in a smooth, continuous course.” (2: 198) 
This same criterion was used in all the subsequent problems. 

The food and stick problem was solved by the gorilla, Susie, in 
fourteen trials, distributed over six days. ‘Trials nine and eleven to 
fourteen inclusive were without error, and required not over twenty- 
five seconds in each case. The longest trial, eleven, required fifteen 
minutes, thirty seconds; and the shortest required only eight seconds. 

Susie’s first reaction was to reach for the food with her hand. 
When this proved futile she then picked up the stick and began to 
play with it. In the course of this random play activity Susie waved 
the stick around in the air and on the table in the direction of the food. 
In the course of this behavior accidentally the stick struck the food 
in such a way as to knock it within reach. However, Susie was not 


lo 
re 


Fie 


pla 
enc 
deg 








Analysis of the Alleged Criteria of Insight Learning 43 


looking at the food at the time but presently, noticing the food closer, 
reached it with her hand. In trial two, following much irrelevant 


IS + 
/44 | | 


13+ 








/2 + | 
ft 


/0 + 





Minutes ' 











Jaw 13 \ Jaw /9\ |\Taweo| Jay 21 | Jaw ha 






































0 in a oo a ow aw 
TRiges 


Fig. 1.—Learning curve of the gorilla on the food and stick problem, in terms of time. 


play, Susie was shoving the stick about on the table when the farther 
end got behind the food. The stick was at an approximate forty-five 
degree angle to Susie and in pulling the stick toward her, she raked in 








44 The Journal of Educational Psychology 


the food. Her attention was directed toward the table this time. In 
subsequent trials Susie gradually learned to secure the food in this 
manner. However, there were several interludes during which she 
would reach for the food with her hand, wave the stick in the air, and 
engage in varied irrelevant play. 


/Or 
9 
at 


NInuTes 
6 4 





—— ED 





2c sf © wt & FS 
TRIALS 


Fig. 2.—Learning curve of the chimpanzee on the food and stick problem, in terms of 
time. 





Learning took place neither immediately nor suddenly, but rather 
it emerged gradually (see Fig. 1). Only in the final trials can it be 
said that Susie ‘‘saw what to do and then did it.”” Chance played an 
important rdéle in the original solution. Susie did not perceive or 
react to the relationships in the situation until such relationships were 
learned, and learned gradually, by experience gained in the course of 
overt trial-and-error behavior. It is interesting to note the fact that 
Susie, on her own initiative, learned to use the stick to secure food, 








Analysis of the Alleged Criteria of Insight Learning 45 


whereas Congo, the gorilla which Yerkes studied, was able to do so 
only after much tutoring.® 

Faced with the same problem, the chimpanzee achieved success 
far more quickly. Out of eight trials, only the first two required 
over fifteen seconds. The first trial began with eight minutes of 
play with the stick, the food being ignored. Dropping the stick, 
Romeo then reached for the food with his hand. Being unable to 
obtain the food, the chimpanzee began playing again. At the end 
of about one more minute Romeo returned to the food, this time with 
the stick in his right hand and without a false move he raked in the 
food. In the second trial, after approximately one minute of play, the 
chimpanzee attempted to secure the food with the stick in his left 
hand but pushed the food too far to the right. The stick was promptly 
shifted to the right hand and used to secure the food. The sub- 
sequent trials were without errors. 

For the chimpanzee, this apparently was not a real problem. Dis- 
regarding the play activity, during which playing with the stick 
became the goal, the solution may be said to have taken place almost 
immediately, certainly suddenly. The sudden drop in the learning 
curve (see Fig. 2) indicates behavior quite different from that dis- 
played by the gorilla in the course of the earlier learning trials. How- 
ever, there is evidence that Romeo could have transferred to this 
problem situation certain experience which he had acquired in previous 
trial-and-error activity. In the cage there is always an abundance of 
rice straw. This straw is very long and heavy. In the course of 
subsequent modifications of this experiment and others the chim- 
panzee often used this straw as a substitute for the stick.* Would it 
not be nearer the truth to say that Romeo was using the stick as a 
substitute for the straw? There was also an abundance of oppor- 
tunity for the chimpanzee to learn by imitation, since the keeper and 
others daily used sticks and similar implements in the cage and sur- 
rounding cages. In other words, immediate solution was made pos- 
sible by transfer of past experience gained in similar situations, which 
past experience was essentially trial-and-error in nature. Accepting 
this interpretation we must conclude that Fig. 2 represents only a 
segment of the learning curve, the learned act itself having a much 





* Kohler reports this same observation in his experiments, but fails to take 
into account the part which this past experience undoubtedly played in the solution 
of the experimental learning problems.’ 





46 The Journal of Educational Psychology 


longer genetic history. In fact, it may be said that the behavior here 
exhibited represented the learned act, not the learning process. 

The accuracy of this hypothesis can be checked only by intro- 
ducing the following problem in which there is a distinctly new element. 

Double-stick Problem.—The second problem was a duplication of 
Kohler’s double-stick problem.? Only the chimpanzee was used as a 
subject, since the gorilla’s destructive tendencies eliminated her. Two 
sticks were provided, either one being too short to reach the food. 
However, one stick was larger around than the other and was hollow. 
The smaller stick could be fitted into this larger one in such a way as 
to make a double-stick which would be long enough to reach the food. 

Romeo solved this problem in eleven trials distributed over four 
days. No solution was made in the first trial, which was discontinued 
at the end of thirty minutes. All responses toward the food were first 
in terms of past experience. Romeo tried to secure the food by using 
one or the other of the sticks. Being unable to do this he spent most 
of the time playing with the sticks, returning to the food occasionally 
and making futile attempts to rake it into the cage. 

During this trial, however, there was some very significant behav- 
ior. Romeo seemed quite curious about the hollow stick. He 
pushed his thumb into the opening, stuck straw into it, and repeatedly 
attempted to insert the smaller stick. There was no evidence that 
this was done in order to make an implement which could be used to 
secure the food. No attention was given to the food during this 
particular random behavior. 

In the second trial, and after much play, Romeo succeeded in 
getting the smaller stick into the larger one. He then played with 
this double-stick until it fell apart. During the course of this behavior 
of putting the sticks together and pulling them apart, the chimpanzee 
often came to the food and tried to reach it or rake it in with one stick. 
At the end of approximately seventeen minutes Romeo happened to 
come to the food at a time when the two sticks were together. Using 
this jointed-stick he succeeded in raking in the food. In the following 
trials the chimpanzee gradually learned the relation between success 
in obtaining the food and the two sticks joined. 

At the end of the learning process, the chimpanzee’s behavior 
displayed insight, that is, understanding of the relation of the sticks 
joined and the food obtained. However, the learning process was 
essentially trial-and-error in nature. The first solution came entirely 
by chance. The significance of this chance success emerged gradually 





a ee 








\e ee _- | el 


> 
2 
< 
S 


r 


y 
y 


Analysis of the Alleged Criteria of Insight Learning 47 


and in no way preceded experience gained in the course of overt, 
random trial-and-error activity. 


Puzzle-box Problems.—The remaining problems required the open- 
ing of a puzzle-box. The gorilla, the chimpanzee, and the children 
served as subjects. The problems were arranged in a sequence of 



























































, we. 
6- 9° 
= ee ae 
ts co 2 Ble" 
~ _ = * 
. o 
as Ps ‘ O| 
2- ~< 
[|e o : (Ble 8 1 
q 




















a, b, c, are the bars locking the box; 
d, e, f, are the catches; 
z is the lid; and 
y is the stationary part of the top of the box. 
Fic. 3.—Top view of the puzzle-box showing the lid locked by the three bars. 


——— (on 


a 


Section AA 
(See section AA of Figure 3) 
Fic. 4.—Drawing of the catch by means of which the bars were locked. 





difficulty so that transfer effects might be observed. A heavy wooden 
box, fifteen inches by fifteen inches by fifteen inches, was fitted with a 
hinged lid, which was locked by one or more iron bars which slid 
under catches fastened to the box (see Figs. 3 and 4). Before the box 
could be opened the bars had to be moved free from the catches which 
held them. Within the box was placed a rubber ball, for the gorilla 





48 The Journal of Educational Psychology 


and children, and food, for the chimpanzee. Before each subject 
began to work on the series of problems, a trial was given with only 
the one bar on the lid and no catch on the box. In this way the 
subject came to know that the lid could be lifted and that there was a 
ball (or food) inside the box. 

In the first problem bar ‘‘a” (Fig. 3), which had to be moved to 
the right, locked the box. The second problem added a duplicate of 













































































4S > 
c—+_—— 7" 
- 
Jy 
~ Pas 
i. 
a 
oa 5] 
4 
ae: 
[ke gp. O ile Py 
el. ° 
|_| 


a, b, c, are the bars locking the box; 
d, e, f, are the catches; 
z is the lid; and 
y is the stationary part of the top of the box. 
Fie. 5.—Top view of the box showing the lid locked by the three inter-locking bars. 


this bar, ‘‘b,”’ placed on the right side of the first bar. The third prob- 
lem added a third bar, ‘‘c,”’ placed at the left of the other two bars. 
This third bar was like the others except that it has to be moved to the 
left. The fourth problem was more complicated, requiring three 
inter-locking bars (see Fig. 5) which had to be moved in a certain 
order, first bar ‘‘a,’’ then bar ‘‘b,” then bar ‘‘c.”_ If the subject proved 
unable to learn this problem of the three inter-locking bars, bar ‘‘a”’ 
was taken off, making the problem a two inter-locking bar one. Once 
this simpler problem had been learned, the original three inter-locking 





oO > re ee Tp o~ As aoe oft cot. coh Fe 26 


°;o 6 Fs fa. 


— 
— 








—  — = 


— 


~ 


me © 





Analysis of the Alleged Criteria of Insight Learning 49) 


bar problem was re-presented. Only in the case of the gorilla and the 
youngest child, a boy aged two years six months, was this necessary. 

The behavior of the gorilla, Susie, is perhaps of greatest interest 
and her learning was essentially the same as that of the chimpanzee. 
Susie achieved learning in the single bar problem, requiring nineteen 
trials distributed over three days. At first Susie responded only in 
terms of her past experience, that is, she lifted on the bar. When 
this response proved futile she resorted to random trial-and-error 
activity. She beat on the top of the box, picked at the hinges, chewed 
a corner of the box, picked at the band which fastened the box to the 
cage. These first responses of the gorilla may be said to be to the 
situation as a whole in the sense that Susie responded to everything 
in the situation about equally (the hinges, the iron band around the 
box, the bars, etc.). This, however, was a function of the newness of 
the problem—a lack of knowledge of the essential and meaningful 
elements concerned. Learning, as such, proceeded analytically as 
chance acts in the course of experimental behavior resulted in solution. 

The first solution came quite by chance. The following is the 
description recorded at the time: ‘“‘Taking hold of the bar with her 
right hand, she moves it back and forth several times, leaving the 
bar beneath the catch. During this behavior Susie’s eyes are focused 
on something outside the cage. Accidentally she pushes the bar clear 
of the catch with her body, but does not notice this. Presently she 
lifts on the bar, opening the box.”’ 

The second trial ended in much the same way. Susie was not 
looking at the box when she moved the bar correctly, but feeling the 
bar move, she immediately looked down and then lifted the lid. 
From this beginning an automatism developed in the course of the 
next ten trials. Susie regularly came to the front of the box, attempted 
to lift up on the bar, then she would move to the left corner of the box 
and push the bar clear while she looked not at the box but off to the 
left. Immediately she would lift the lid. In the fifteenth trial Susie 
was looking at the box when she moved the bar. After this trial her 
attention was always directed toward the box and the next four trials 
were without errors. 

Learning in the two-bar problem meets one of the criteria most 
commonly set up to designate insight learning, that is, learning took 
place immediately. The gorilla attacked the new bar, ‘‘b,’’ first, 
moving it correctly, then moved the original bar and opened the box. 
This was done four successive times without error, the time ranging 





50 The Journal of Educational Psychology 


from seven to three seconds. The solution came immediately because 
bar ‘‘b” was an exact duplicate of bar ‘‘a,’’ making possible a complete 
transfer of past experience which proved sufficient to achieve solution. 
Immediate solution was a function of the transfer of past experience, 
which experience originally had been gained by overt trial-and-error. 

When a problem contains a distinctly new element, will the solu- 
tion come without experimentation? This question is answered by 
the behavior exhibited in the three-bar problem. The new bar, 
‘““c,”’ had to be moved to the left. Susie mastered this problem in ten 
trials, the first trial requiring one minute, forty-seven seconds, and 
the last trial taking only three seconds. Susie’s responses to this 
problem were first in terms of her past experience. Repeatedly she 
tried to move the new bar in the same direction in which she had suc- 
cessfully moved the other two. When this failed, the gorilla resorted 
to trial-and-error behavior, during which she chanced to move the new 
bar in the correct way. Several repetitions of this chance solution 
were necessary to effect learning. Only gradually did the incorrect 
responses drop out. The method of opening the box when learning 
was complete was as follows: Bar “‘b”’ was taken in the right hand and 
bar ‘‘c”’ in the left hand and cleared simultaneously, then bar ‘‘a”’ 
was moved with the right hand. 

Confronted with the three interlocking bars, Susie managed to 
open the box in the course of eight trials distributed over three days. 
However, the solutions came entirely by chance and there was no 
evidence that learning was taking place. In the ninth trial Susie, 
meeting with complete failure to achieve solution, began to display 
anger, beating her chest in gorilla fashion, gritting her teeth, and 
chewing the box. Finally she left the box entirely, refusing to work 
at the problem. 

On the following day Susie was presented with the two interlocking 
bar problem, and achieved learning in six trials. On the next day 
when the three interlocking bar problem was re-presented, it was 
promptly learned in six trials. The method of opening the box in the 
final trials was as follows: Bar ‘“‘a’’ was moved with the right hand, 
then bar ‘‘b”’ seized with the right hand and bar ‘‘c”’ grasped with the 
left hand and both cleared simultaneously. The time required ranged 
from three to four seconds in these last trials which were without error. 

There are four stages in the learning behavior exhibited by the 
gorilla in this series of puzzle-box problems. (1) First, learning 
achieved by trial-and-error, where there is no past experience with 





_~ _> ——_— — 


rh 








\w x —_—_—_ Bis 


aonRmr ere SN = 


_ 
~ 


rk 


lay 
vas 


the 
nd, 
the 
red 
ror. 
the 
ing 
vith 





Analysis of the Alleged Criteria of Insight Learning 51 


the problem situation; (2) secondly, immediate solution without overt 
trial-and-error, made possible by the transfer of past experience; (3) 
thirdly, failure to achieve learning in a complicated problem containing 
radically new elements, though solution, as such, took place several 
times by chance; (4) finally, learning of this complicated problem 
after the subject had been allowed to acquire experience with some 
of the essential new elements in a simpler problem situation. This 
final success took place by trial-and-error plus the transfer of past 
experience. 

The behavior of the chimpanzee and of the children on this series 
of puzzle-box problems may be summarized briefly. Overt trial-and- 
error was evident, except with the older children. Here was evidence 
that this experimental activity took place on the mental level, but 
was none the less trial-and-error in nature. For instance, one of the 
children said: ‘‘Oh, I see, this one won’t let me unfasten that one.” 
(Meaning the bars.) In the simpler problems a child occasionally 
opened the box from the start without error. However, the explana- 
tion in each case seemed to lie in transfer of past experience. For 
instance, one child said: ‘‘This is like the latch on our gate.”’ 

It is interesting to note that the behavior of the two-and-a-half- 
year-old child resembled very closely that of the gorilla, the child 
failing to achieve learning in the three interlocking bar problem until 
the acquisition of experience on the two ‘interlocking bar problem. 
The behavior of the chimpanzee differed from that of the gorilla in 
only two significant ways. First, he achieved learning in the three 
interlocking bar problem without the necessity of the two interlocking 
bar problem. Secondly, he worked in the experimental situations 


much more slowly. However, in no instance was Romeo’s learning 
either immediate or sudden. 


SUMMARY AND CONCLUSIONS 


From the experimental findings some very definite answers may be 
derived to the questions which have been raised. 

1. Learning never takes place immediately when the problem is, in 
reality, new. The learner resorts first to behavior experimental in 
nature. Immediate solution is a function of the transfer of past 
experience, which experience was acquired originally by trial-and- 
error activity. 

2. Learning proceeds analytically in the sense that reactions to the 
significant elements in the situation emerge from the reactions made 


52 The Journal of Educational Psychology 


to the situation as a whole. In the beginning all phases of the prob- 
lem are reacted to more or less equally. Only gradually, and by 
means of experience gained in the course of random behavior, are the 
significant elements selected out and the others neglected. 

3. Chance plays a dominating rdéle in learning in the sense that 
the appearance of the adaptive act is unpredictable. In the course of 
trial-and-error behavior the learner chances to do certain things which 
effect solution and by repetition result in learning. Insight in no way 
precedes these chance successes but, rather, insight is the end-product 
(the learned act) resulting from these chance adaptive acts. 

4. The criteria of insight learning do not differentiate this as a 
mode of behavior from trial-and-error. The term insight may be 
used only to describe the fact of learning (the end-product) and not as 
descriptive of the learning process, that is, how learning takes place. 


REFERENCES 


1. Koffka, Kurt: The Growth of Mind. New York: Harcourt Brace and Company, 
1924, pp. xvi + 382. (Ogden Tr.) 

2. Kohler, Wolfgang: The Mentality of Apes. New York: Harcourt, Brace and 
Company, 1925, pp. vir + 342. (Winter Tr.) 

3. Ogden, Robert M.: Psychology and Education. New York: Harcourt Brace and 
Company, 1926,-pp. xu + 364. 

4. Thorndike, E. L.: Animal Intelligence, Psychological Review Monographs, Vol. 
11, No. 5, May, 1901, pp. 1-57. 

5. Thorndike, E. L.: Human Learning. New York: The Century Co., 1931, pp. 
203. 

6. Wheeler, R. H.: The Science of Psychology. New York: Thomas Y. Crowell 
Company, 1929, pp. xv + 356. 

7. Wheeler, R. H.: Readings in Psychology. New York: Thomas Y. Crowell 
Company, 1930, pp. x + 597. 

8. Yerkes, R. M.: The Mind of a Gorilla, Genetic Psychology Monographs, Vol. 1, 
January, March, 1927, pp. 1-193. 

9, YERKES, R. M.: The Mind of a Gorilla: Part II, Mental Development. Genetic 
Psychology Monographs, Vol. u, July, 1927, pp. 379-551. 





tt ittwtésh6—hC<—Cr hr thc teléCOh fF 








STANDARDIZATION OF A VALUES INVENTORY 


A. C. VAN DUSEN, STAN WIMBERLY, AND CHARLES I. MOSIER 


University of Florida 


Spranger intuitively set up six types of men, distinguished by their 
prevailing evaluative attitudes; namely, Social, Theoretical, Aesthetic, 
Religious, Political, and Economic, without attempting any test 
scheme to differentiate between them. Allport and Vernon,’ basing 
their classification directly upon Spranger’s Types of Men,‘ con- 
structed a test to measure the relative prominence of the six basic 
interests or motives in personality. The test is a self-rating scale and 
does not readily lend itself to standardization since only intra-indi- 
vidual comparisons are possible. This comparison of an individual 
with himself seems of doubtful value unless it is certain that these 
types or traits are the only ones, and that no others are possible. 

Lurie has applied Thurstone’s multiple factor technique to this 
problem of classification in his ‘‘Study of Spranger’s Value-Types by 
the Method of Factor Analysis.’’! Starting with Spranger’s rational 
classification as a first approximation, he constructed a battery of tests 
to differentiate the various traits, and performed a factor analysis upon 
the data to determine what primary factors existed. His study 
revealed four major and three minor factors. The major factors were 
(1) Social, (2) Philistine, a combination of Spranger’s Economic and 
Political, (3) Theoretical, and (4) Religious; while the minor factors 
were (5) Open-mindedness, (6) Practicality, and (7) Aesthetic. 

Lurie’s test consisted of one hundred forty-four items, twenty-four 
corresponding to each of Spranger’s six types. The material used 
was similar to that employed by Allport and Vernon, since it offered 
a fairly wide sampling of the subject’s opinion. Lurie points out that 
the material may be arbitrarily classified as dealing with: ‘‘(a) The 
present interests; (b) the ideals; (c) the preferences with regard to people; 
(d) the beliefs and opinions of the person taking the test.’”’ His results 
indicate that the first three classifications show the highest correlation 
with the primary factors. 

The preparation of the items for each trait involved the selection 
of six items representing each of these four classes. The individual’s 
reaction to the various items of the test were recorded on a seven-point 
rating scale which has been adopted for the present study and which 
will be described later. Lurie’s analysis of the data obtained from 

53 





54 The Journal of Educational Psychology 


his tests provides the basis upon which the Standards Inventory was 
constructed. 

It is the purpose of the present study to set up a series of items 
divided into five scales to differentiate the value-types found by Lurie. 
Four of these five scales were found by him to exist as primary factors; 
the fifth, Aesthetic, he designated as of secondary significance; but it 
has been included here since it is a value-type. It is the further pur- 
pose of this study to standardize these scales on the basis of the results 
obtained from an experimental group. 

The Standards Inventory consisted of ninety-six items, sixteen 
for each of Spranger’s types. Since Lurie found that the items for 
Political and Economic measured a common factor; namely, Philistine, 
these items were combined into one scale of thirty-two items which 
was called Economic. The test in its final form consisted, then, of 
five scales: (Economic, Social, Religious, Aesthetic, and Theoretical) ; 
the items for which were selected directly from the first three sections 
of Lurie’s battery; namely, znterests, ideals, and preferences with regard 
to people. Items from the fourth section, beliefs and opinions, were 
entirely omitted since they were found in his analysis to have only low 
factor loadings with respect to the traits being measured. Only one 
group of the preferences with regard to people was retained. Four 
minor changes or substitutions were made in the items in the interest 
of a clear understanding by the subjects, e.g., ‘‘Croesus’’ was replaced 
by ‘‘ Mellon” in the economic scale. 

The test was divided into ten item-groups of varying length. 
Group I requires the individual to indicate preference for teaching cer- 
tain subjects; algebra, e.g., is an index of theoretical interest. Group 
II concerns preference for reading material; national affairs under that 
group is considered an index of economic interests. Group III affords 
a choice of careers; the acceptance here of the item Musician pre- 
sumably indicates high aesthetic values. In Group IV various 
magazines are to be rated; the appeal of Christian Century to the 
subject suggests his religious values. Group V provides the oppor- 
tunity for rating famous personages; Florence Nightingale is a social 
item. Group VI inquires into the degree of pleasure which the subject 
would experience should his son in college use his spare time in one of 
various ways. Group VII calls for the subject’s rating of various 
careers for his children. Group VIII lists several ways for disposing 
of a considerable sum of money. Group IX deals with the appeal 





—" 


ff i & £..56 22 foe 


o> @M 3 n~ 


ae 








Standardization of a Values Inventory 55 


of certain lecture subjects, while Group X relates to the degree of 
preference for certain personality traits as applied to oneself. 

Lurie’s method of scoring was adopted without alteration. Scores 
for each item were obtained by having the subject rate his degree of 
acceptance on a seven-point scale: ‘‘Complete Rejection,” “Strong 
Disapproval,”’ ‘‘ Mild Disapproval,’’ ‘‘Complete Indifference,” ‘‘ Mild 
Approval,” ‘“‘Strong Approval,” and ‘“‘Complete Acceptance.”’ It was 
assumed that the distances between these scale points were equal. 
Weightings of zero for ‘‘Complete Rejection” to six for ‘‘Complete 
Acceptance” were assigned. 

The directions for the test appeared at the beginning as follows: 
‘This inventory is an analysis of your likes and dislikes and interests. 
There are no right answers and no wrong answers. In the appropriate 
space at the right, indicate by an ‘X’ for each item how you feel about 
that item.” 

The test was distributed among male students in introductory 
courses in Psychology at the University of Florida. Eighty-one 
blanks were satisfactorily filled out and returned and the standardiza- 
tion was based on these cases. 

It becomes necessary at this point to define the universe from which 
these eighty-one cases constituted a random selection. The limits of 
this universe should not be so wide (inclusive) that the sample is not 
representative, nor should it be so narrow (exclusive) that the test 
becomes inapplicable to any group other than the eighty-one subjects. 
In consideration of the above factors, we shall attempt to define a 
reasonable universe and postulate its probably influence upon some 
features of the results to be obtained. First of all, this sample was 
taken from university men, but, more specifically, they were Univer- 
sity of Florida men. Since they were also drawn from introductory 
classes in psychology, the universe might conceivably be limited 
to University of Florida students who take psychology. Although 
psychology is included under the Arts and Science curriculum, it may 
be, and is, elected by students from the entire University. The 
universe, then, is probably best defined on the basis of the sample of 
eighty-one cases as students of the University of Florida, with a possi- 
bility of generalizing to university students at other American 
universities. 

The necessity of two postulates becomes evident in the attempt to 
make comparable the scores obtained on these scales. First, that a 





56 The Journal of Educational Psychology 


given scale point, such as “‘ Mild Disapproval,” indicates the same 
degree of preference for all of the sixteen items measuring Theoretical 
value, distributed throughout the test, as it does for any other group 
of sixteen, and, second, that the average general attractiveness of the 
items in one scale of sixteen items, apart from their trait significance, 
is the same as the average attractiveness of the items in the other 
scales. 

College men in general are selected in respect to Theoretical inter- 
ests as evidenced by their pursuit of higher education. Economic 
advancement, for many, constitutes a prime motive for the seeking of 
higher education. There may be some selective influence with regard 
to the social trait in so far as college education or general enlighten- 
ment affects interests in people and social relations. There seems to 
be no evident reason for assuming that the universe for this particular 
study is very different from the universe in general with regard to 
Religious and Aesthetic values. Therefore, it is reasonable to expect 
that the Economic, Theoretical, and Social traits should show higher 
mean values and smaller standard deviations than the Religious and 
Aesthetic traits. 

Tests were graded and raw scores recorded for each of the five 
traits on each blank. These raw scores were the sum of the response 
weightings for the items checked in each particular trait. A stenciled 
key was prepared for each of the five tests to facilitate the grading. 
Since the items on the Economic trait were thirty-two in number 
(Lurie’s battery on Philistine), instead of sixteen as in the other four 
traits, the score for this trait was obtained by dividing the summed 
scores of the responses to these thirty-two items by two, thus equating 
the five traits with respect to maximum raw score. 

Frequency distributions of these raw scores were tabulated and 
polygons were drawn. ‘The medians, quartiles, and quartile deviations 
were computed. Percentile scores were assigned to the raw scores to 
facilitate person-group comparison within each trait. The means and 
standard deviations were calculated for each trait. Reliability coeffi- 
cients were computed as follows: The whole test was correlated with 
the half test, z.e., every item was plotted against every other item. 
This was reduced to the correlation coefficient for half the test with 
the other half by the formula: 


Oxrl ry — Oy 








Tu: = 
ws Voz" + oy” _ 27 ry7 Oy 





ao er 4 2A 


r 
i 


er Fe © 


in 
CC 








Standardization of a Values Inventory 57 


where z is the raw score on the whole test, y is the raw score on the 
odd-numbered items of the test, and z is the raw score on the other 
half. These results were corrected by the Spearman-Brown formula 
to give the reliability of the entire test of sixteen items.® 


ait ae 27 yz 
zz 1 + Pus 





The tests were correlated with one another, and with the American 
Council on Education Psychological Examination,’ compared with the 
results of the Strong interests scores,® and with vocational choice. An 
item analysis was performed on the basis of a quartile differentiation 
for the scores on each trait. The median case of each distribution was 
omitted in order to equate the quartiles at twenty cases. The results 
here as Richardson* has pointed out make possible approximations to 
the correlation coefficient between each item and the total test. 

The results of this study are summarized in Table I. 








TABLE [ 

Economic | Theoretical Social Religious Aesthetic 
Md 65.43 70.62 66.27 54.9 57.9 
Q; 59.21 61.8 56.9 44.7 47.9 
Q: 73.19 75.15 73.0 64.8 64.5 
Q 6.99 6.67 8.05 10.0 8.8 
M 65.8 68.72 65.7 54.5 57.1 
a 8.93 9.62 10.4 14.18 12.9 
Fes .87 .92 .95 .92 .95 
Pu .55 .67 .75 .67 .79 
Tus 71 .80 .86 .81 .88 
TACE — .08 .03 .02 — .02 .17 
E 
T 19. 
S . 26 — .04 
R .17 — .06 .61 
A .O1 18 .38 31 




















The means and standard deviations in Table I show that the group 
investigated is high in Economic, Theoretical, and Social values as 
compared with Religious and Aesthetic, confirming the hypotheses 





58 The Journal of Educational Psychology 


earlier made concerning the effect of the particular universe on the 
results. 

The Economic trait was found to have a reliability coefficient of 
.71. This was the lowest obtained for any test and it becomes more 
significant when it is made comparable to those of the other tests, 7.e., 
adjusted to a length of sixteen items. The coefficient which is com- 
parable to those of the other tests is given by r,. (Table I) and is .55. 
Lurie found that the items for Economic and Political in his test 
measured a common trait, which he named Philistine. Since we found 
a measure of this trait to be highly unreliable in relation to the other 
traits measured, the most probable explanation is that these thirty-two 
items in the present test do not measure a single trait, to an extent 
which would warrant combining the two sets of items into a single 
scale. According to Mosier? high reliability is a necessary but not a 
sufficient condition for the demonstration of a test as a measure of a 
single trait. This renders extremely doubtful the existence of Lurie’s 
‘“‘Philistine”’ as a unitary trait. 

The other coefficients of reliability ranged from .80 to .88. These 
are all too low for individual prediction as is seen from the following 
consideration: The standard error of estimate® is given by 


o=-0:Vl-—r 


where 7 is the coefficient of reliability. The standard error of estimate 
of the Aesthetic value, the trait with the highest reliability, was found 
to be plus or minus 4.55. In the case of the individual whose true 
score was at the mean, fifty-seven, this would indicate a probability of 
two-thirds for his actual score appearing at some point within a range 
with a lower limit of 52.45 and an upper limit of 61.55. Furthermore, 
there remains a probability of one-third that his obtained score lies 
outside of this range. If these raw scores are reduced to percentile 
rank, the lower one is assigned a percentile value of .43 and the upper 
one, .70, a difference of twenty-seven percentile ranks. This is too 
great a variation for individual prediction, although group comparison 
might be satisfactory. 

Of course, if the test is given to a group where in any scale the 
standard deviation is larger than that reported here, the reliability 
will be correspondingly greater.® 

Correlations were run between the tests and scores on the American 
Council on Education Psychological Examination. The low and 





ne} 
no 


pre 


the 
ser 
lov 
fiv 


wilt 


bet 
bet 
the 
me 
an 
cor 
the 
ing 
tra 
bu 


tiv 
an 
Lu 








Standardization of a Values Inventory 59 


negative correlation coefficients obtained showed that the tests were 
not measuring intelligence or anything correlated with it. 

It was possible that one group of subjects were consistently inter- 
preting the scale-categories higher, while the other group consistently 
marked them lower. If this were the case, the tests would not measure 
the various traits, but would measure the general liberality or con- 
servatism of the individual in interpreting the scale categories. It fol- 
lows from this hypothesis that all the intercorrelations between the 
five tests would be definitely positive. Referring to the tabled results 
of intercorrelations, we see that seven out of eight of these are zero 
within the limits of sampling error. This refutes the hypothesis. 

The intercorrelations further show the degree of relationship 
between the different tests. Any significant degree of correlation 
between two values indicates that either one value is dependent upon 
the other, or that both of them depend upon something more funda- 
mental. The coefficients show that with the exception of the social 
and religious scales the tests are measuring separate traits. A cross 
correlation of the religious items with the social scores indicates that 
the high intercorrelation is probably the result of the social items hav- 
ing been so selected that they measure both the social and religious 
traits. Conceivably, this is due to a misinterpretation of the items, 
but, more likely, the two tests are measuring some common factor 
which might be postulated as “humanitarianism.” The low posi- 
tive intercorrelations which were found between aesthetic and social, 
and aesthetic and religious are in agreement with results found by 
Lurie and by Whitely.® 

A further investigation of the validity may be made on the basis of 
the information obtained from the subject’s choice of a vocation, and 
from the Strong Vocational Interest Blank.’ This test purports to 
measure the degree to which an individual’s pattern of interests is 
similar to those of men who are successful in the various vocations. 
A man’s interest in a vocation may, at least, be partially based upon 
his fundamental values. Examining the relationships existing between 
the five evaluative attitudes and the Strong scores, as well as voca- 
tional choice, we find: (1) Economic values have a high relation with 
Office Clerk, Schoolman, and Certified Public Accountant. The 
trait is negatively related to Engineer and has practically no relation- 
ship with Law. Little can be drawn from the vocational choices 
within this group. (2) Theoretical value is positively related to 








60 The Journal of Educational Psychology 


Schoolman and Engineer. It shows low negative relation with Certi- 
fied Public Accountant. (3) In examining the vocational choices of 
the Aesthetic group, seventeen in the highest quartile were found to 
have chosen an aesthetic vocation (such as Architecture, Actor, etc.) 
as compared to only three in the lowest quartile. (4) Social preference 
is positively related to Law, and Office Clerk, the latter being a voca- 
tion possessing no apparent a priori relationship with the trait. It 
has a high relation with Schoolman. Negative relation is shown with 
Engineer and practically no relation with Certified Public Accountant. 
(5) Religious has a high positive relation with Schoolman and Office 
Clerk (another unpredictable relationship) and zero relation with 
Law, Engineer, and Certified Public Accountant. One third of the 
upper quartile on Religious were interested enough to ask to have the 
Strong Blank scored on Ministry as compared to none in the lower 
quartile. No one in the lower quartile mentioned the ministry as a 
vocational choice, while six in the upper quartile chose Ministry as a 
vocation. 

The above results substantiates the claim that the tests are measur- 
ing different evaluative attitudes and implies that, in general, these 
measures are valid. 

Finally, the item analysis showed a few items to be inconsistent 
with the trend of scores as measured consistently by the other items 
for each trait. This result carries an implication for both the validity 
and the reliability of the test. In the item analysis for the social scale, 
the sixteenth item (likable) showed a correlation with the total social 
value of only .06. All the other social items were much more highly 
correlated. We then have for this scale some fifteen items which are 
a reliable measure of something. Since they were chosen as indices of 
social value, the probability of their measuring something else is small. 
Some of the other scales were found to contain several items which 
had low correlations with their respective tests. In general, the ideals 
class of Lurie, which were distributed throughout all the scales, showed 
low correlation with the total test scores for the scales in which they 
appeared. This is probably due to the fact that these items are 
highly attractive to anyone regardless of his prevailing evaluative atti- 
tude. The substitution of items consistent with the scale for unsatis- 
factory items will increase the reliability for these tests, since the 
reliability of a test increases as the average correlation between the 
test and its items becomes higher.* 








ith 


ice 
ith 
she 
she 
ver 
3a 
sa 


ur- 
ese 


ent 


ity 
ale, 
cial 
hly 
are 
3 of 
all. 
ich 
eals 
ved 
hey 
are 
tti- 
tis- 
the 
the 





Standardization of a Values Inventory 61 


SUMMARY AND CONCLUSIONS 


In an attempt to obtain a measure of the five evaluative attitudes 
(Philistine, Theoretical, Religious, Social, and Aesthetic) found by 
Lurie to exist as factors in personality, a series of scales combined into 
one inventory were made, the test was given to eighty-one University 
of Florida students, and scores were obtained on each of the five scales. 

Frequency distributions were tabulated on these raw scores and 
polygons drawn. The medians, quartiles, and quartile deviations 
were computed. Percentile scores were assigned to the raw scores. 
The means and standard deviations were calculated and the reliability 
coefficients were computed for each test. The tests were intercorre- 
lated, correlated with the American Council on Education Psycho- 
logical Examination, and compared with the interest scores on the 
Strong Vocational Interest Blank. Item analyses of each score were 
performed. 

The group investigated was found to exhibit higher Economic, 
Theoretical, and Social evaluative attitudes than Religious and 
Aesthetic. This is probably a function of a sampling effect in the par- 
ticular universe from which the subjects were drawn. 

The tests were all found to possess reliabilities too low to make the 
test useful in individual prediction, but high enough to permit group 
comparisons. 

The Economic and Political items of Lurie’s battery should not be 
combined in an additive manner to form a resultant Philistine trait. 
Care should be taken to find items which are a more valid measure of 
the Economic value. 

The tests are not correlated with intelligence scores. 

With the exception of the Religious scale, which was found to be 
somewhat correlated with Social scores, the tests were not significantly 
intercorrelated. 

The items in general are positively correlated with test score. The 
group of items taken from Lurie’s ideals class seemed, however, to be 
equally acceptable to all groups. 

The tests were found to be valid using external criteria. The first 
and fourth quartile of each trait was related to the scores on the Strong 
Vocational Interest Blank. Each evaluative attitude was found to 
be positively related, with a few minor exceptions, to those vocations 
which are naturally related to these various values. 





62 The Journal of Educational Psychology 


To construct a more reliable measure of the evaluative attitudes 
whose classification has been an issue since Spranger’s original intuitive 
classification, it is suggested that a revision of the test be made on 
the basis of the item analysis. Items having the least correlation and 
dispersion should be discarded in favor of better ones. Further check 
against external criteria would provide means of obtaining a higher 
degree of validity. 


BIBLIOGRAPHY 


1. Lurie, Walter A.: ‘‘A Study of Spranger’s Value-Types by the Method of 
Factor Analysis.”” J. Soc. Psychol., Vol. vim, pp. 17-37. 

2. Mosier, Charles I.: ‘‘A Note on Item Analysis and the Criterion of Internal 
Consistency.”’ Psychometrika, 1936, Vol. 1, pp. 275-282. 

3. Richardson, M. W.:‘‘ Notes of the Rationale of Item Analysis.’? Psychometrika, 
1936, Vol. 1, pp. 3-30. 

4. Spranger, E.: (English edition) Types of Men. Halle: Max Niemeyer Verlag, 
1928. 

5. Strong, Edward K., Jr.: Vocational Interest Blank. Stanford University Press, 
Stanford University. 

6. Thurstone, L. L.: The Reliability and Validity of Tests. Edwards Brothers, Inc., 
1935. 

7. Thurstone, L. L., and Thurstone, T. G.: Psychological Examination. The 
American Council on Education, Washington, D. C. 

8. Vernon, P. E., and-Allport, G. W.: “‘A Test for Personal Values.” J. Abn. and 
Soc. Psychol., 1931, Vol. xxv1, pp. 231-248. 

9. Whitely, P. L.: ‘‘A Study of the Allport-Vernon Test for Personal Values.” 
J. Abn. and Soc. Psychol., 1933-1934, Vol. xxvit, pp. 6-13. 








des 
ive 
on 
nd 
eck 
her 


l of 
ral 
tka, 
lag, 


"ess, 





DOES TEST INTELLIGENCE 
INCREASE AT THE COLLEGE LEVEL? 


T. M. LIVESAY 


University of Hawaii 


This study presents data resulting from the retesting of fifty 
university students at the end of the Senior year in 1936. The tests 
were almost exactly four years apart as the first test was given in the 
Spring preceding college entrance. The test used was the Psychologi- 
cal Examination for College Freshmen, published by the American 
Council on Education—a test widely used by American universities 
as an admissions criterion.! The 1931 edition was used in both cases 
in order to make the scores directly comparable, and it is assumed 
that any practice effect was dissipated by the four year interval 
between tests. It is certain that none of these students had oppor- 
tunity to see any form of these tests during the interim. A larger 
number of cases would naturally have been more reliable, but the 
second test was entirely voluntary and only fifty responded. 

The age range at the time of the first test was from fifteen years 
and seven months to twenty-seven years and five months, with the 
median at exactly eighteen. The one extreme case of high age was a 
woman who had finished high school some ten years before and 
married shortly afterward. The sex division was fairly even as there 
were twenty-eight men and twenty-two women. 

Every individual gained in total score, the gains ranging from five 
to ninety-nine points. However, except in the case of the Completion 
test, the picture is rather different with the subtest scores. In Artificial 
Language twelve showed a loss in score while two remained the same; 
in Analogies five lost and one made the same score; in Arithmetic 
fourteen lost while four remained the same; and in Opposites three 
showed a decrease in score. 

Table I gives the means, standard deviations, gains (differences 
between means), critical ratios, and chances of true differences for the 
men, women, and total group for the five subtest scores and total score. 
All three groups gained in each instance, with the men gaining more 
than the women in all except the Opposites test. However, it should 
be noted that the women had higher average scores on both the first 
and second tests in every case except Arithmetic. In eleven of the 





! For reliabilities see Table V. 


63 





64 The Journal of Educational Psychology 


eighteen comparisons the critical ratios indicate complete reliability. 

Table II presents the means, standard deviations, gains, critical 
ratios and chances of true differences for the two age groups—under 
eighteen (25) and over eighteen (25). This is, of course, a rather 
broad division but was necessary because of the limited number 
of cases. 


TABLE I.—GalIns IN ScOREs BY SEX AND FOR THE WHOLE GROUP 


























Standard Chances in 
Means a ie 

deviations one hun- 

Criti- dred of a 

Tests Groups Gains'| cal | true differ- 
Sec- = Sec- ratios ence 

First ane" First pa greater 

aoe test cost test than zero? 
eS ities crate age 25.04) 39.50/10.295/10.610|) 14.46 | 5.18 100 
Completion | Women (22)............. 30.41) 43.82/11.910)12.205) 13.41 | 3.69 100 
Whole group (50)....... 27.40) 41.40 ommanins vadinias 14.00 | 6.10 100 

| 

Artificial Ns oaks eu Waa ane a ese 37.00) 43.25)12.535)14.180| 6.25 1.75 96 
ten oa Nr i a 43.82] 48.14|17.090/14.610| 4.32 | 0.90 82 
guag . . . aceeesn oe 40.00} 45.40)15.100)14.575| 5.40 | 1.82 96 
te ate aca Sie goats eas 24.14] 33.61)12.060\/12.030| 9.47 | 2.94 100 
Oe. FT ccc ccncssonnes 31.09) 37.23)12.760|10.495|) 6.14 | 1.74 96 
WED MOE, oc cccccececs 27.20) 35.20)12.845)11.520) 8.00 | 3.28 100 
Rg gaa cs KM ee 24.50] 28.97)10.895)15.490| 4.47 | 1.25 8&9 
Arithmetic | Women................. 19.27} 22.91}11.940|15.420} 3.64 | 0.88 82 
Whole group............. 22.20) 26.30)11.660\15.750) 4.10 | 1.48 93 
aida ita neeaaiiee we ee 34.86) 47.18)15.025)14.175) 12.32 | 3.16 100 
Opposites | Women................. 41.55) 56.09\17.250|14.430| 14.54 | 3.03 100 
Whole group............. 37.80) 51.10)16.380/14.955| 13.30 | 4.24 100 
DE icaihurdinainrcws suede 145.93|192.36|40.950|46.700) 46.43 | 3.96 100 
TOA BOONE T TOMI. occa cc ccccccccess 165.86/208.59/51.640/47.950| 42.73 | 2.84 100 
Whole group............. 154.70|199.50/47.010/47.930| 44.80 | 4.72 100 





























1 Differences between means. 
2 Garrett, Henry E.: Statistics in Psychology and Education. Table XIV, p. 134. 


The younger group made higher average scores on both tests in 
every case except the first test in Arithmetic, where there is a dif- 
ference of .80 in favor of the older group. The same is true in gains 
made on the second test as the younger group surpassed the older in 
all except Artificial Language and Opposites. However, with the 








Ss = = a: 


n 


r- 





Does Test Intelligence Increase at the College Level? 65 


possible exception of Arithmetic, the differences in gains between the 
two groups are small. What is significant, perhaps, is the fact that 
the younger group consistently made higher scores on the two tests, 
this superiority standing out particularly in the case of the total scores 
where the differences were 36.60 and 40.20. In the matter of true 


TABLE II.—GaIns IN ScoreEs BY AGE GROUPS 











Standard Chances 
Means cafe ‘= one 
deviations 
Criti hundred 
. riti- of a true 
asia ated Gains cal difference 
First pes First yon ratios | preater 
test t " test test than 
” ” zero 
Completion Under 18..| 30.60) 44.80)10.725| 9.390) 14.20 | 4.98 100 
premmo® | Over 18... .| 24.00} 38.00/11 .055/12.570| 14.00 | 4.18 | 100 

















Artificial | Under 18..| 46.20) 50.20/16.105/16.115| 4.00] .88 82 
language | Over 18....| 33.80} 40.60/10.945/10.910} 6.80 | 2.20 99 
Analogies | UBder 18..| 31.60} 40.00)12.160| 8.830} 8.40 | 2.79] 100 
BICS | Over 18....| 22.80] 30.40/11.975/11.895| 7.60 | 2.25 99 
Arithmetic | Under 18. .} 21.80] 29.40]11 .090|18.660 7.60 | 1.75 96 
Over 18....| 22.60) 23.20/12.190/11.340/ .60| .18 58 

Opposites | Under 18..| 42.20) 55.40|16.030)14.335| 13.20 | 3.07 | 100 
Ppostves | Over 18....| 33.40! 46.80/15.525/14.315| 13.40 | 3.17 | 100 
Total | Under 18. .|173.00/219.20/49.110143.290] 46.20 | 3.53 | 100 
score | Over 18....|136.40/179.00/37.590/41.790| 42.60 | 3.79} 100 





























differences seven of the twelve comparisons indicate complete 
reliability. 

In Table III the total group is divided into four subgroups by 
quarters in terms of total scores on the first test in order to determine 
the relative improvement. The four divisions in scores were as 
follows: First group—seventy-four to one hundred fifteen; second 
group—one hundred sixteen to one hundred forty-five; third group— 





66 The Journal of Educational Psychology 


one hundred forty-six to one hundred eighty-eight; and the fourth 
group—one hundred eighty-nine to two hundred sixty-eight. 

The largest gains were made by the second or third group in every 
case—the second group in Artificial Language, Opposites and Total 


TABLE IIJ.—GaIns IN ScORES BY GROUPS IN TERMS OF TOTAL SCORES 





























ieee Standard Chances 
deviations in one 
Criti- | hundred 
Tests Groups! Gains cal | of a true 
First Sec- First Sec- ratios | difference 
test ond inne ond greater 
test test than zero 
17.36| 30.21) 7.670) 8.985) 12.85 | 4.07 100 
Completion Second...... 26.09} 42.91) 7.635) 9.000) 16.82 | 4.73 100 
I swine 27.38) 44.31) 6.640) 7.235) 16.93 | 6.21 100 
Fourth...... 40.33) 49.92) 8.975|10.300| 9.59 | 2.43 99 
ay ir cnc 30.57| 37.71) 9.340) 8.835) 7.14 | 2.08 98 
Artificial language Second...... 31.09) 40.64) 9.000)11.695) 9.55 | 2.15 98 
: | eee 42.38) 43.15)12.780/15.085 ae .14 56 
Fourth...... 56.58] 61.17)11.805) 8.860) 4.59 | 1.08 86 
i canaaas 19.50) 25.93) 7.500) 9.295) 6.43 | 2.01 98 
Analogies Second....... 21.09) 28.82/11.645/10.505) 7.73 | 1.63 94 
30.46| 42.77|13.355| 7.300) 12.31 | 2.92 100 
Fourth...... 38.25) 43.67) 8.195) 4.715) 5.42 | 1.99 98 
ee 13.43] 14.14) 6.390) 9.585 .71 .23 60 
Arithmetic Second...... 18.36} 20.64/11.890) 8.815) 2.28 .51 69 
SN Si ad's 27.77| 35.46) 5.130/11.990) 7.69 | 2.13 98 
Fourth...... 29.92) 35.75/12.820/17.810| 5.83 .92 82 
ne 22.36| 37.36)11.255| 8.550) 15.00 | 3.97 100 
Opposites Second...... 34.27} 53.36) 8.885|/12.630/ 19.09 | 4.10 100 
RES 41.23) 50.08) 8.955) 9.715) 8.85 | 2.41 99 
Fourth...... 55.33) 66.17/)14.190)12.045| 10.84 | 2.02 98 
ee 101.64/145.57|10.770)20.220) 43.93 | 7.17 100 
Second...... 131.55/187.45/)11.565/28.080) 55.90 | 6.11 100 
Total score BOs sc cces 170.46/215.46)13.350)25.065| 45.00 | 5.71 100 
Fourth...... 220.75|\254 .49|22.470/22.275| 33.74 | 3.69 100 





























1 Grouped by quarters in terms of total scores. 


Score and the third in Completion, Analogies, and Arithmetic. The 
first group also showed larger gains than the fourth in all but the 
Arithmetic test. In the case of the highest group it should be borne 
in mind, however, that there was less possibility of large improvement 








1€ 


1e 
nt 








Does Test Intelligence Increase at the College Level? 67 


because of the high level on the first test. In ten of the twenty-four 
comparisons the critical ratios indicate complete reliability of the 
differences. 

Table IV gives the coefficients of correlation between the scores 
on the two tests, the probable errors of these coefficients, and the 
official reliabilities of the whole examination and its subtests. The 


TABLE 1V.—CoORRELATIONS BETWEEN THE Two SEtTs OF SCORES 











Coefficients of : 
, Official 
correlation! be- | Probable errors of Pag ae 
Tests : reliabilities? 
tween the scores | the coefficients 
of the tests 
on the two tests 
Completion............. .82 .032 81 
Artificial language....... .69 .050 .98 
Cs she eeecws es .69 .049 . 86 
pee eee .70 .048 .82 
I 6-66 6 een een .86 .025 .87 
ee .88 .021 .95 











1 Pearson Product-Moment r. 
2 Manual of Instructions, 1933 Edition, pp. 2—4. 


coefficients are practically identical with the test reliabilities in Com- 
pletion and Opposites, but considerably lower in the others. 


SUMMARY AND CONCLUSIONS 


1. This study presents data analyzed according to individual, sex, 
age, and group differences for fifty university students retested, after 
four years of college work, on the 1931 edition of the American Council 
Psychological Examination for College Freshmen. 

2. Every individual gained in total score, although there were 
some losses on subtest scores. 

3. The men gained more than the women in all except the Oppo- 
sites test, but the women had higher average scores on both tests in 
every case except Arithmetic. 

4. The younger group made higher average scores in all except 
Arithmetic on the first test, and gained more than the older group on 
all but the Artificial Language and Opposites tests, although the 
differences were small. 








68 The Journal of Educational Psychology 


5. In the case of ability groupings the largest gains were made by 
the two middle groups in every instance. The first, or lowest group, 
made greater gains than the highest group in all but Arithmetic, but 
it is evident that the latter had less chance to increase because of the 
higher initial scores. 

6. The coefficients of correlation are almost identical with test 


reliabilities for Completion and Opposites, but lower for the other 
tests. 











THE HANDWRITING OF NEGROES 
THOMAS R. GARTH, MARY J. MITCHELL, AND CORRINE N. ANTHONY 


University of Denver 


The purpose of this study is to test the hypothesis of racial dif- 
ferences as it may involve the commonplace facts of handwriting. 
The question of differences may be settled empirically only by meas- 
uring some performance of two racial peoples when an effort has been 
made to control nurtural influences. Handwriting seems readily to 
lend itself to the experiment for the reason that samples may be 
obtained easily from school children of different racial groups with 
approximate similarity of opportunity as regards training in the 
common schools. 

When such samples have been obtained, two methods of pro- 
cedure offer themselves: One, that of measuring the samples directly 
with various standard scales of measurement of handwriting for 
comparison; the other, that of determining characterological differ- 
ences or similarities between the samples. In the present study 
we are following the precedent set by studies of the senior collaborator! 
and Elizabeth Weisser? when they measured the handwriting of 
Indians, full and mixed bloods, by means of the Thorndike Hand- 
writing Scale and the Freeman Scale for the Diagnosis of Handwriting 
and compared the result with those obtained from measuring samples 
of handwriting of whites. Results of a characterological study of 
racial differences in handwriting, now in process, yet remain to be 
reported on. This is the third study of racial differences in hand- 
writing so far appearing in the literature. 


PROBLEM 


We have asked ourselves these questions: 

1. Do Negro children write as legibly as white children? 

2. Do they write with the same speed? 

3. Are there any other measurable differences, such as: Differences 
in slant, spacing, quality of line, alinement, letter formation? 


4. Does this study indicate that these Negro children are educa- 
tionally retarded? 


MATERIALS 


The Thorndike Handwriting Scale and the Freeman Scale for 


Diagnosis of Handwriting were used in this study for measuring 
69 











70 The Journal of Educational Psychology 


samples of handwriting of whites and Negroes. Most of the samples 
were judged by at least two individuals. 


COMPOSITION OF THE GROUPS 


There were two groups of Negroes and whites measured. The 
first, hereafter designated Group I, was composed of three hundred 
ninety white children and three hundred sixty-six Negro children from 
a large city school system in the Middle West and the second, Group 
II, was composed of one hundred twenty-five white children from 
various Western school communities and one hundred fifty-four Negro 
children from a large city school system of the West. This made a 
total of five hundred fifteen whites and five hundred fifty Negroes. 
The second group is to be regarded as merely a check on the results 
for the first or larger group, in as far as possible, (see Table I). The 
children ranged from the fourth to eighth grade in Group I, and fourth 
to sixth grade in Group II. In Group I the Negro children had Negro 
teachers, and in Group II the Negro children had white teachers. 


THE RESULTS OF THE MEASUREMENTS 


Table I shows, besides the composition of the groups, the average 
ages, the medians, and measures of reliability for certain groups as 
well as measures of difference. To be explicit, here we have, beside 
that of age, measures of legibility and speed for the five hundred 
fifteen whites and five hundred fifty Negroes grade by grade; we have 
also the consideration of differences and the significance of same. Also, 
we have measures of slant, space, and quality of line for the same 
groups and consideration of differences and their significance for the 
same groups. As to the measures of alinement and letter formation 
we have results for Group I only, that is three hundred ninety whites 
and three hundred sixty-six Negroes. 


INTERPRETATION 


It will be seen upon examination of Table I that the white and 
Negro subjects were practically the same age for each grade. While 
occasionally the median age is higher for the Negroes than for the 
whites, but not consistently grade for grade, the differences are not 
found to be significant when the test of a difference is applied. 

As to legibility, the Negroes were found to write about aslegibly 
as the whites in both groups, except in the eighth grade of Group I 
where the whites excel the Negroes very definitely. The medians in 

















The Handwriting of Negroes 71 
TABLE I.—SHOWING COMPOSITION OF THE GROUPS 
Group I Group II 
Grade 
IV Vv VI VII VIII IV Vv VI 
Number* 
dc ok a Aine wae ebm 93 89 75 38 95 64 33 28 
DEED. be cccccveceesoses 76 103 72 53 52 58 54 72 
e 
ites 
Medium years.......... 10.50 | 11.30 | 13.00 | 13.13 | 13.61 9.52 | 10.67 | 11.82 
PP cssvstevesesess 1.70 1.81 1.82 1.71 1.42 .46 .37 .47 
Negroes 
Medium years.......... 10.20 | 11.80 | 12.61 | 13.81 | 14.12 9.67 | 10.50 | 11.53 
oe eee ess 3.80 2.60 1.92 1.72 1.83 .55 .45 .45 
D/ DN es Jia ea eeawace .52 .13 .81 .73 1.39 1.90 1.00 .93 
Legibility 
Whites 
Medium score.......... 9.01 | 10.51 | 11.62 | 12.56 | 13.01 8.42 8.91 9.17 
er eae ree 3.11 1.62 3.12 2.12 1.31 71 .61 .47 
Negroes 
Medium score.......... 9.92 | 10.32 | 10.31 | 11.01 | 10.82 8.25 8.66 9.30 
Ee re aid nee ae 2.61 3.14 2.11 2.92 1.91 .65 .59 .38 
rs 1.72 .47 2.32 2.30 6. 87 1.21 1.56 1.75 
Starch’s norms........... 8.71 9.32 9.84 | 10.84 | 10.92 
Speed 
Whites 
Medium score.......... 51.02 | 59.06 | 64.52 | 60.04 | 65.03 | 41.30 | 47.50 | 65.02 
Die dp deeahe du we wae 3.12 3.31 3.12 2.22 2.16 | 11.52 8.15 8.65 
_ Negroes 
Medium score.......... 45.09 | 62.05 | 65.01 | 84.06 | 76.02 | 30.40 | 42.41 | 42.51 
a he i oe ad a 2.11 3.15 2.52 1.83 1.32 9.25 8.70 7.97 
a nies Sew aan 12.22 5.43 .87 | 42.81 | 32.35 4.23 2.24 9.57 
Starch’s norms........... 47.00 | 57.00 | 65.00 | 75.00 | 83.00 
Slant 
Whites 
Medium score.......... 2.81 3.42 3.82 3.91 4.11 3.04 3.41 3.28 
a EG, eae ye 91 1.12 1.61 72 2.11 1.01 44 .58 
Negroes 
Medium score.......... 2.51 3.12 3.74 4.11 3.81 1.74 2.87 3.50 
 F ee ei ae aa ae 1.11 .72 2.31 1.42 .92 .97 .92 .60 
Ns uc 5 '4-Wan'ee tobe 1.52 2.07 .23 .76 3.00 6.18 3.41 1.37 
Space 
Whites 
Medium score.......... 2.91 3.72 3.61 3.22 3.84 3.36 3.37 3.07 
ORR ARR mee ear 1.32 1.51 1.32 1.34 1.22 . 84 .36 .88 
Negroes 
Medium score.......... 2.81 3.62 3.51 3.84 3.71 3.99 3.21 3.03 
, Ee EA chore 1.81 1.32 1.22 2.16 1.17 1.91 .96 a 
D/ E diff ARE ae eos eee .33 .42 .o7 1.36 .42 1.85 .88 .18 
Quality of line 
Whites 
Medium score.......... 2.80 3.80 4.10 3.50 3.40 1.86 2.87 2.99 
ed ec acs euads 1.81] 1.12] 2.10} 1.10] 1.21 90 94 .97 
Negroes 
Medium score.......... 2.51 2.91 3.16 4.12 3.94 1.57 1.87 3.03 
ee ade ek aw baie ems 1.61 1.42 1.73 1.92 1.81 .28 .83 .98 
er .96 4.04 2.43 1.50 1.42 1.95 4.16 -1l 
Alinement 
Whites 
Medium score.......... 2.91 3.62 3.74 3.81 3.34 
Et Se ee 1.51 .10 .62 .o2 .92 
Negroes 
Medium score....... e--| 2.44 3.21 3.62 3.84 3.91 
PE Js cpeakennens 1.61 | 2.12] 3.14] 2.21] 1.91 
PME s se esvcccsccses 2.17 1.66 .21 .08 1.11 
Letter formation 
Whites. 
Medium score.......... 2.61 3.71 3.42 3.42 3.50 
ES eee 1.81 1.62 Bae 2.13 1.82 
Negroes 
Medium score.......... et eo 3.62 3.54 3.62 
ey Rp 1. 1.21 | 2.34] 1.52 
RR .81 2.30 .83 aa .29 



































* Number of whites: Group I, 390; Group II, 125; total 515 Negroes: Group I, 366; Group II, 
184; total 550. 











72 The Journal of Educational Psychology 


this instance are 13.01 and 10.82, respectively, and the test of a dif- 
ference is 6.87. Here the Negro group is rather small, fifty-four, 
and consequently the finding is thereby less important. At least, it is 
an exception to the rather consistent finding of no significant differ- 
ences. It will be noted besides that these whites are above the 
norm for whites reported by Starch,* which is a score of 10.92 for the 
eighth grade. When we compare the Negro median of 10.82 with 
Starch’s norm, the Negroes of the eighth grade seem to do about as 
well at least as Starch’s subjects. 

In the matter of speed we find the whites have the better score in 
the case of the fourth grade of Group I and in the fourth and sixth grades 
of Group II. In grades V, VII, and VIII of Group I the Negroes excel 
the whites. In the sixth grade of Group I and the fifth grade of 
Group II the races measure about the same. As judged by the results, 
we may say that neither race is consistently superior to the other. 
Neither are the results consistent when we compare the Negro scores 
with the Starch norms, for here the Negroes of Group I equal or excel 
the whites in the sixth and eighth grades. 

In the case of slant, there is to be found no significant difference 
except in the fourth grade of Group II. There is an approximation 
to a difference in the fifth grade of Group II. 

In vain do. we look for differences of significant character in the 
racial measures of space. But in the measures of quality of line the 
whites excel the Negroes in the fifth grades of both Group I and 
Group II, and the measures of difference are significant. Still the 
differences are not here consistently in favor of one race over the other 
from grade to grade in the groups. 

The measures of alinement and of letter formation show no real 
differences between the racial groups. 


SUMMARY 


1. The whites and Negroes are about the same age for each 
grade. 

2. The whites and Negroes write with about equal legibility 
regardless of the fact that in over half of the cases the Negroes had 
Negro teachers. 

3. The Negroes are about equal to the whites in speed. 

4. There are found no consistent tendencies to differences in slant 
which may be regarded as racial. 








The Handwriting of Negroes 73 


5. The two races, white and Negro, have nearly equal scores 
in space. 

6. The quality of line of the two races as measured is about 
the same with the exception of the fifth grades, where the whites excel. 
This is an isolated case and can hardly justify a belief in a difference 
which would be racial. 

7. In alinement and letter formation the measures of the Negroes 
are practically identical. 

8. The study of handwriting of whites and Negroes reveals no 
thorough-going evidence of racial differences. 

9. The Negro shows no retardation educationally in so far as 
the results of this study go. 

10. The conclusion then is that if whites and Negroes are given 


the same training in handwriting, they probably will be found to be 
the same in performance. 


BIBLIOGRAPHY 


1. ‘Garth, T. R.: “The Handwriting of Indians,” Jour. Ed. Psy., 1931, pp. 706-709. 

2. Weisser, Elizabeth: ‘‘A Diagnostic Study of Indian Handwriting,” Jour. Ed. 
Psy., 1932, pp. 703-707. 

3. Starch, D.: Educational Psychology, Macmillan, New York, 1927. 








BOOK REVIEWS 


L. L. Tourstone. Primary Mental Abilities. Chicago: University of 
Chicago Press, 1938, pp. rx + 121. 


This volume, the first in the new Psychometric Monographs series, 
contains the results of the first extensive application of the new fac- 
torial methods. An excellent non-statistical introduction to the 
problem of factor analysis is provided in the opening chapter. Then 
follows a description of the experimental investigation which supplies 
the data for analysis. This test experiment is based on a battery of 
fifty-seven (mistakenly stated as fifty-six in a number of places in the 
monograph) tests which were administered to two hundred forty 
volunteer college students, somewhat more homogeneous as regards 
ACE tested intelligence than a typical college group. The aim was to 
include a large variety of pencil-and-paper tests covering a wide range 
of abilities, hence the usual types of subtests in current use were 
included, plus many others, some of which were especially constructed 
for the experiment. A description of the fifty-seven tests is given, 
and illustrative items from each are included except for the more 
common tests. The odd-even reliabilities range from .50 to .98 with 
thirty-nine coefficients above .85. 

For the sake of economy, all intercorrelations (and reliabilities) 
were computed as tetrachorics, the cut always being near the median. 
The table of intercorrelations was subjected to the centroid method of 
factor analysis, and a total of thirteen centroid factors were extracted. 
These arbitrary axes were then rotated in order to isolate the primary 
abilities; and in attempting to assign meaning to these primary axes, 
those tests having projections greater than .40 on a given axis were 
studied to determine their possible ‘“‘common elements.’”’ Here is 
where mathematical analysis gives way to psychological analysis, 
and thus Thurstone arrives at definitive names for seven factors. 
These factors or primary abilities have to do with visual Space, visual 
Perception, Numbers, Verbal relations, Words, immediate or rote 
Memory, and Induction (capitals mine to denote symbols used by 
Thurstone). It is noted that several of these factors have been found 
previously by other investigators, notably by T. L. Kelley. 

The final chapter is devoted to predictions of primary abilities of 
individuals from their original test scores by means of regression 
equations. The appendix gives the score distribution for the fifty- 

74 


ans ae. ee ae ee ee ee | rr a er a a 


oo —* fF 24 het eA 


mn @w 


Ss Oo, S&S Ba A. 


» 


ry < 








Book Reviews 75 


seven tests, the correlational matrix, the centroid matrix, and the 
rotated orthogonal matrix. 

Typical of Thurstone and his mode of attack, a large number of 
hypotheses are suggested for further experimental work, and the 
reader is told that studies are already under way to investigate the 
nature of the isolated factors. It is planned ultimately to construct 
“‘pure”’ tests of the several primary abilities. If and when this is 
successfully done, the reviewer sees no logical reason why those who 
are most skeptical of factorial methods would not grant that the 
substitution of seven tests for an original fifty-seven represents a 
desirable economy in measurement. Unfortunately, despite the 
convincingness of Thurstone’s monograph, the skeptic can still point 
to the subjective nature of the interpretations involved in assigning 
psychological meaning to the so-called primary abilities. This lack 
of objectivity, however, cannot be deemed any more serious than 
subjective and introspective interpretations in other fields of 
psychology. 

On page vi of the preface, Thurstone states that so far he has not 
found any evidence for a general common factor in Spearman’s sense. 
It seems to the reviewer that no argument has been put forth which 
would deter Spearman and his followers from claiming that the first 
centroid factor corresponds to their common factor. Of course, there 
is nothing unique about the first centroid factor except that it alone 
accounts for a large part of the variance of the several tests, and 
certainly such a g is ill-defined psychologically, but perhaps no more 
so than Spearman’s general factor. 

In this monograph a new criterion is proposed, and used, for 
determining how many centroid factors to extract. Had the discarded 
criterion (residual variance reduced to sampling variance of original 
mean intercorrelation) been used, only three factors would have been 
determined. The new criterion may be more valid, but certainly a 
part of the logic which led to its adoption is open to serious question. 
This new criterion was arrived at empirically by adding to each of the 
r’s in tables of fictitious intercorrelations of known factorial origin a 
variable random error, the possible magnitude of which depended 
partly upon the size of a given 7, since the sampling error of r is a 
function of its magnitude. Now this procedure does not allow for the 
fact, analytically demonstrated by Pearson and Filon in 1898,' that 





1 Pearson, K. and Filon, L. N. G.: “On the probable errors of frequency con- 
stants and the influence of random selection on variation and correlation.”” Phil. 
Trans. Roy. Soc., 1898, 191A, pp. 229-311. 











76 The Journal of Educational Psychology 


the sampling errors of correlation coefficients for a system of inter- 
correlated variables are not independent. If riz is, by chance, con- 
siderably higher than its universe value, the other r’s in the first 
column (and row) of the correlation matrix will tend, in general, to be 
somewhat higher than their respective universe values. Now when 
the columns of the correlation matrix are summed, as in the centroid 
method, the chance errors injected by Thurstone will tend more 
nearly to balance than would be the case were correlated chance errors 
involved. The reviewer is unable to say to what extent the new 
criterion is invalidated because of the failure to allow for the proper 
operation of chance. 

Moreover, it seems extremely unlikely that an adequate criterion 
will be found which is independent of the number of cases in the 
sample. All other statistical formulas which are concerned with 
sampling involve some function of N. In the application of the new 
criterion to the analysis of the fifty-seven tests, we note that the 
criterion value approaches its limiting value of .982 in the following 
manner: .556, .878, .941, .962, .959, .970, .974, .974, .978, .966, .960, 
.972, and .980, from which it is concluded that thirteen factors are 
needed. Perhaps the shade of difference between .970 and .980 means 
something, but until the criterion is more rigorously determined it 
seems a bit fortuitous to accept such a convergence as being of suffi- 
cient significance to indicate that more than six centroid factors are 
justified. What would happen if another sample were drawn? 

It might be claimed that, because the author has succeeded in 
finding psychological meaning for seven, and possibly two more, 
factors, it can be assumed that more than chance is operating. At this 
point one wonders how often scientists have succeeded in supplying a 
rational interpretation for something which was later proved to have 
been determined by error or chance. Aside from the subjectivity 
involved in giving meaning to the factors, one also notes a disturbing 
arbitrariness in the selection of tests utilized in rationalizing some of 
the factors. For instance, in considering the ‘‘ verbal relations”’ factor 
the two vocabulary tests with projections on this reference axis of 
.395 and .385 are listed, while Disarranged Sentences and Spelling, 
with projections of .395 and .386, are ignored. 

The points which the reviewer has raised are not to be regarded as 
major criticisms of a study which can very properly be considered as 
the outstanding experimental contribution in an important field of 
investigation. So far as the factorial methods are concerned it would 














Book Reviews 77 


seem that the weakest links in an otherwise strong chain are those 
which bridge the gap between mathematical factors and psychological 
meaning and the chasm between sample and universe values. In fact 
the latter link is practically missing, and from the reviewer’s viewpoint 
this constitutes a serious limitation. This not only has to do with 
how many centroid factors are of non-chance significance, but also 
with the stability of test projections. 

This monograph is an excellent example of what can be done by 
factorial methods provided an investigator is willing to plan his 
research on an extensive scale. It cannot be said that Thurstone’s 
basic data were faulty, or that he engineered his set-up so as to obtain 
seven factors, or that he generalizes too far beyond his findings, or 
that he is committed to any preconceived notion as to the organization 
of ability, or that he believes this study settles once and for all time 
the question as to the primary abilities of man. 


QuInN McNemar. 
Fordham University. 


ARTHUR E. TrRaAxuteR. The Use of Tests and Rating Devices in the 


Appraisal of Personality. New York City: Educational Records 
Bureau, 1938. 


This succinct digest of information about personality tests is the 
most practical, most useful, and most profound summary which has 
come to the attention of this reviewer. The literature in regard to 
personality tests has grown so voluminous that only the expert in the 
field can hope to keep abreast of it. Several very complete bibliogra- 
phies of material in this field have been published in recent years, but 
these too, by their very exhaustiveness, tend to discourage the teacher 
or counselor who wishes practical information. The present bulletin 
fills the gap in admirable fashion. 

Traxler assembled this material for the Bureau of Educational 
Records, with the thought that it would be useful to school psy- 
chologists, counselors, and college advisors. There is a clear and 
remarkably thorough discussion—considering its brevity—of the 
different procedures for measuring and appraising behavior and 
personality traits and patterns. This is followed by a well-selected 
list of forty-five of the most useful tests, rating scales, and the like. 
Each of these is described in sufficient detail to enable the reader to 
decide whether it would be valuable in a particular group. This 











78 | The Journal of Educational Psychology 


carefully annotated list of forty-five instruments is much to be com- 
mended. Since the author mentions that there are now between four 
and five hundred tests and inventories relating to personality, this 
list may be taken to indicate that only one out of ten is genuinely 
useful. Probably this is not far from the truth. 

In the closing pages of the bulletin the author suggests various 
aspects of personality which have received the most attention by test 
makers, and gives references for further reading. 

Adequately to summarize a field so complex as that of personality 
testing, much more is required than merely boiling down factual 
material into concise, usable, form. The author must in addition see 
clearly the fundamental issues and problems in the field, must use 
sound judgment in the selection of significant material. Traxler has 
accomplished these aims in unusual fashion and his bulletin is com- 
mended to all those who are using, or are interested in using, per- 
sonality tests, whether in school, college, or behavior clinic. 

Cart R. RoGErs. 
Director, Child Study Department, Rochester Society for the 
Prevention of Cruelty to Children. 


D. Paterson; G. SCHNEIDLER, and E. Wiutuiamson. Student Guid- 
ance Techniques. New York: McGraw-Hill Book Company, Inc., 
1938, pp. 316. 


The authors of Student Personnel Procedures and Techniques, which 
was published in a mimeographed edition for the use of faculty coun- 
selors at the University of Minnesota, have revised their material and 
have brought out an enlarged ition in book form. Teachers who 
are giving courses in counseling, will find this handbook valuable, and 
those who are engaged actively in guidance will discover some tech- 
niques which will help them in the diagnosis of the problems of students. 

The book stresses the complexity of student problems and points 
out that the counselor ‘‘is confronted not with a single problem, but 
with a variety of problems concerning educational, vocational, emo- 
tional, social, economic and health adjustments.’”’ The stand is taken 
that the student is usually unable to diagnose his own abilities and 
that in consequence there is a need for the trained diagnostician, 
skilled in the use of diagnostic techniques, such as the interview; 
different methods of the discovery of the lack of emotional adjustment, 
as evidenced by motor disturbances, attitudes and modes of conduct; 


~~ rn —_" ~ 


> & Dames © & A oo = we 


* 








Book Reviews 79 


cumulative records; and various measurement techniques. Only a 
few pages are devoted to a description of techniques other than tests, 
scant space being given to rating scales and observation. The anec- 
dotal method of recording observations is mentioned, but there is no 
reference to the use of the autobiography, the life history or the 
questionnaire. ‘The case study is indicated rather than presented in 
detail. The greater part of the book, that is some two hundred pages, 
is concerned with the tests of scholastic aptitude, academic achieve- 
ment, personality and special aptitude. In fact, one of the chief 
values of the book is the conciseness and clarity with which information 
regarding tests is given under the following headings: description; 
designed for; norms; reliability and validity. 

But a book which is planned to assist counselors in solving intricate 
problems of adjustment should not ignore to so great an extent the 
other techniques through the use of which one gains some insight into 
the difficulties besetting students. Tests are only one method of 
approach, and one which is still open to much controversial dispute. 
A counselor who has become skilled in the technique of observation 
may get a better conception of a particular individual by a long time 
series of anecdotal observations than by the quicker snapshots obtained 
through tests. 

If the problems presented by students are complex, involving so 
many facets, it is unfortunate that the discussion should be more or less 
limited to those concerned with educational guidance. A short chap- 
ter towards the end of the book is reserved for personal problems and 
only eighteen pages suffice for a treatment of vocational problems. 
After all, have not the various choices made by the student regarding 
curricula, courses and schools been made in order that he may live 
an adjusted life in a world of work? Then, too, one is somewhat 
puzzled by the statement that students must be diagnosed and coun- 
seled before they are given occupational information, since “‘it is 
decidedly unfair to stimulate students to think of their occupational 
future unless they have first been given an understanding of their own 
assets and liabilities.” Surely, everyone knows that occupational 
information is no longer a part of the school curriculum solely for the 
purpose of enabling students to make vocational choices, but it is 
also for the purpose of acquainting them with the economic and social 
milieu in which they live. 

A stronger emphasis might well have been placed on the importance 
of trying to interpret one’s findings correctly and, likewise, on the 








80 The Journal of Educational Psychology 


difficulty of interpreting them wisely, irrespective of the technique 
employed. This should have been done, not with the aim of dis- 
couraging but with the aim of enlightening the counselor. Greater 
stress might also have been made in regard to the counselor’s need of a 
broad cultural background and a rich experience in life contacts, for 
any technique in the hands of a well-prepared counselor becomes an 
instrument of stronger diagnostic power. However, in spite of these 
criticisms, the book is recommended to those who are interested in 
personnel work, but with the final caution that tests do not tell all 
the story, especially those tests concerned with personality and 
aptitudes. Mary THERESA SCUDDER. 

Instructor in Counseling Techniques 

University of Delaware (Summer Session). 


Correction 


In October Journal, “Relationships between three multiple orthog- 
onal factors and four bifactors,” by Karl J. Holzinger, on page 515 
after equation (1) should read: “Where ‘h’ is the square root of the 
Communality or the square root of the sum of the squares of the 
factor loadings for each variable.”’ 











